Group by two columns is possible? - sql

I have this table:
ID Price Time
0 20,00 20/10/10
1 20,00 20/10/10
2 20,00 12/12/10
3 14,00 23/01/12
4 87,00 30/07/14
4 20,00 30/07/14
I use this syntax sql to get the list of all prices in a way that does not get repeated values:
SELECT * FROM myTable WHERE id in (select min(id) from %# group by Price)
This code return me the values (20,14,87,20)
But in this case I would implement another check, that will not only sort by price but also by date, example: That syntax is getting the list by price, if I find a way to check by date, the code will return me the values (20,20,14,87,20)
He repeats 20 two times but if we see in the table we have three numbers 20 (two with the date 20/10/10 and one with the date 12/12/10) and is exactly what I'm wanting to get!
Somebody could help me?

To group by multiple columns, just put a comma in between the list.
SELECT price FROM myTable group by price, time order by time
The group by looks at all distinct combinations of the listed columns values, and discards duplicates. You can also use aggregate functions like sum or max to pull in additional columns to the results.

The following should work as long as all you need is the price/time combination. If you need to include the ID, things get more complicated:
SELECT `Price` FROM items
GROUP BY `Price`, `Time`
ORDER BY `Time`;
Here's a fiddle with the result in action: http://sqlfiddle.com/#!2/40821/1

Related

SQL Grabbing unque counts per category

I'm pretty new to SQL and Redshift, but there is a weird problem I'm getting.
So my data looks like below. Ignore id, date_time actual values... I just put random info, but its the same format
id date_time(var char 255)
1 2019-01-11T05:01:59
1 2019-01-11T05:01:59
2 2019-01-11T05:01:59
3 2019-01-11T05:01:59
1 2019-02-11T05:01:59
2 2019-02-11T05:01:59
I'm trying to get the number of counts of unique ID's per month.
I've tried the following command below. Given the amount of data, I just tried to do a demo of the first 10 rows of my table...
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 10*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7), id
I expect something like below.
count left
3 2019-01
2 2019-02
But I'm instead getting similar to what's below
I then tried the below command which seems correct.
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 1000000*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7)
However, if you remove the DISTINCT portion, you get the results below. It seems like it is only looking at a certain month (2019-01), rather than other months.
If anyone can tell me what is wrong with the commands I'm using or can give me the correct command, I'll be very grateful. Thank you.
EDIT: Could it possibly be because maybe my data isn't clean?
Why are you using a string for the date? That is simply wrong. There are built-in types. But assuming you have some reason or cannot change it, use string functions:
select left(date_time, 7) as yyyymm,
count(distinct id)
from t
group by yyyymm
order by yyyymm;
In your first query you have id in the group by which does not do what you want.

SELECT MIN from a subset of data obtained through GROUP BY

There is a database in place with hourly timeseries data, where every row in the DB represents one hour. Example:
TIMESERIES TABLE
id date_and_time entry_category
1 2017/01/20 12:00 type_1
2 2017/01/20 13:00 type_1
3 2017/01/20 12:00 type_2
4 2017/01/20 12:00 type_3
First I used the GROUP BY statement to find the latest date and time for each type of entry category:
SELECT MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category;
However now, I want to find which is the date and time which is the LEAST RECENT among the datetime's I obtained with the query listed above. I will need to use somehow SELECT MIN(date_and_time), but how do I let SQL know I want to treat the output of my previous query as a "new table" to apply a new SELECT query on? The output of my total query should be a single value—in case of the sample displayed above, date_and_time = 2017/01/20 12:00.
I've tried using aliases, but don't seem to be able to do the trick, they only rename existing columns or tables (or I'm misusing them..).There are many questions out there that try to list the MAX or MIN for a particular group (e.g. https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ or Select max value of each group) which is what I have already achieved, but I want to do work now on this list of obtained datetime's. My database structure is very simple, but I lack the knowledge to string these queries together.
Thanks, cheers!
You can use your first query as a sub-query, it is similar to what you are describing as using the first query's output as the input for the second query. Here you will get the one row out put of the min date as required.
SELECT MIN(date_and_time)
FROM (SELECT MAX(date_and_time) as date_and_time, entry_category
FROM timeseries_table
GROUP BY entry_category)a;
Is this what you want?
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC;
This returns ties. If you do not want ties, then include an additional sort key:
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC, entry_category;

Single column values into multiple columns in hive

I have table which updates on weekly basis, I need to check count variation check between one week and previous week values. I just did below....
Select
case when F.wk_end_d=max(F.wk_end_d) over (partition by F.wk_end_d)then F.the_count end as count
from
(
select wk_end_d, count(*) as the_count
from table A
where wk_end_d between date_sub('2019-03-02',7) and '2019-03-02'
group by wk_end_d
) F
which give me value like below
100
200
but I need get value like 100 200 on 2 different columns as I need build some other calculations on top of it.

How do I count data from 2 different tables by date

I have 2 tables with no relations, both tables have different number of columns, but there are a few columns that are the same but hold different data. I was able to create a function or view of only the data I wanted, but when I try to count the data by filtering the date, I always get the wrong count in return. Let me explain by showing the 2 functions and what I try to do:
Function 1
ID - number from 1 to 8
data sent - YES or NO
Date - date value
Function 2
ID - number from 1 to 8
data sent - yes or no
date - date value
Upon running both separately, I get all the rows from the tables and everything looks good.
Then I try to add the following to each function:
select
count([data sent]), ID
from function1
Where (date between #date1 and #date2)
group by ID
The above statement works great and gives me the right result for each function.
Now I thought what if I want to add those 2 functions into one and get the count from both functions on 1 page.
So I created the following function:
Function 3
select
count(Function1.[data sent]) as Expr1,
Function1.id,
count(Function2.[data sent]) as Expr2,
Function1.date
from
Function1
LEFT OUTER JOIN
Function2 on Function1.id = Function2.id
Where
(Function1.date between #date1 and #date2)
group by
Function1.id
Upon running the above, I get the following table:
ID Expr1 Expr2
On both Expr1 and Expr2, I get results which I am not sure where they come from. I guess something is being multiplied by 100000 since one table holds almost 15000 rows and the other around 5000 rows.
What I would like to know first is if it possible at all to be able to filter by date and count records from both table at the same time. If anyone need more information please let me know and I will be glad to share and explain more.
Thank you
The LEFT OUTER JOIN is taking each row of the left table, finding ALL of the rows in the right table with the same id field, and creating that many rows in the result table. Since id isn't what we usually think of as an identity field (it looks more like a "deviceId" or something), you'll get lots of matches for each one. Repeat 15000 times and you get your combinatorial explosion.
Tip: To debug things like this, you can create sample tables with a tiny subset of the real data, say 10 rows from each, and run your query on them. You'll see the issue immediately.
It's possible to filter by date. It's hard to recommend an actual solution without better understanding your phrase "I want to add those 2 functions into one and get the count from both functions on 1 page".
Why can't you create a temporary table for each function then join them together?
Maybe subqueries can help you to achieve what you want:
SELECT
ID = COALESCE(f1.ID, f2.ID),
Date = COALESCE(f1.Date, f2.Date),
f1.Expr1,
f2.Expr2
FROM (
SELECT
ID,
Date,
Expr1 = COUNT([data sent])
FROM Function1
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f1
FULL JOIN (
SELECT
ID,
Date,
Expr2 = COUNT([data sent])
FROM Function2
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f2
ON f1.ID = f2.ID AND f1.Date = f2.Date
This query also uses full (outer) join instead of left join, in case the right side of the join contains rows that have no match in the left side (and you want those rows).

Query return rows whose sum of column value match given sum

I have tables with:
id desc total
1 baskets 25
2 baskets 15
3 baskets 75
4 noodles 10
I would like to ask the query with output which the sum of total is 40.
The output would be like:
id desc total
1 baskets 25
2 baskets 15
I believe this will get you a list of the results you're looking for, but not with your example dataset because nothing in your example dataset can provide a total sum of 40.
SELECT id, desc, total
FROM mytable
WHERE desc IN (
SELECT desc
FROM mytable
GROUP BY desc
HAVING SUM(total) = 40
)
Select Desc,SUM(Total) as SumTotal
from Table
group by desc
having SUM(Total) > = 40
Not quite sure what you want, but this may get you started
SELECT `desc`, SUM(Total) Total
FROM TableName
GROUP BY `desc`
HAVING SUM(Total) = 40
From reading your question, it sounds like you want a query that returns any subset of of sums that represent a certain target value and have the same description.
There is no simple way to do this. This migrates into algorithmic territory.
Assuming I am correct in what you are after, group bys and aggregate functions will not solve your problem. SQL cannot indicate that a query should be performed on subsets of data until it exhaust all possible permutations and finds the Sums that match your requirements.
You will have to intermix an algorithm into your sql ... i.e a stored procedure.
Or simply get all the data from the database that fits the desc then perform your algorithm on it in code.
I recall there was a CS algorithmic class I took where this was a known Problem:
I believe you could just adapt working versions of this algorithm to solve your problem
http://en.wikipedia.org/wiki/Subset_sum_problem
select desc
from (select desc, sum(total) as ct group by desc)