SQL Grabbing unque counts per category - sql

I'm pretty new to SQL and Redshift, but there is a weird problem I'm getting.
So my data looks like below. Ignore id, date_time actual values... I just put random info, but its the same format
id date_time(var char 255)
1 2019-01-11T05:01:59
1 2019-01-11T05:01:59
2 2019-01-11T05:01:59
3 2019-01-11T05:01:59
1 2019-02-11T05:01:59
2 2019-02-11T05:01:59
I'm trying to get the number of counts of unique ID's per month.
I've tried the following command below. Given the amount of data, I just tried to do a demo of the first 10 rows of my table...
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 10*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7), id
I expect something like below.
count left
3 2019-01
2 2019-02
But I'm instead getting similar to what's below
I then tried the below command which seems correct.
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 1000000*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7)
However, if you remove the DISTINCT portion, you get the results below. It seems like it is only looking at a certain month (2019-01), rather than other months.
If anyone can tell me what is wrong with the commands I'm using or can give me the correct command, I'll be very grateful. Thank you.
EDIT: Could it possibly be because maybe my data isn't clean?

Why are you using a string for the date? That is simply wrong. There are built-in types. But assuming you have some reason or cannot change it, use string functions:
select left(date_time, 7) as yyyymm,
count(distinct id)
from t
group by yyyymm
order by yyyymm;
In your first query you have id in the group by which does not do what you want.

Related

sql count function with subquery

here is my query
select narr,vocno,count(*)
from KontenLedger
WHERE VOCDT>'2018-07-01'
group by narr,vocno
having count(*)<'3'
actually if i wright as i given above ,the result which calculates two fields ('narr' and 'vocno') if i remove the field ('narr') answer is correct. i need to view the field 'narr' also without counting
Without knowing your database, nor having some limited sample date, nor expected output?
SELECT
vocno,
COUNT(*) AS total,
MAX(narr) AS max_narr
FROM KontenLedger
WHERE vocdt > '2018-07-01'
GROUP BY vocno
HAVING COUNT(*) < 3

How to select the correct date in the same column when data in different rows are equal

I have the following problem with my data on a DB2 database. I want to create an overview when a machine was used for a project with a begin and end date.
The following data is available:
||Machine name||Description||Project||Start date|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-03-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|16-03-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|24-04-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_2|13-05-2016|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|22-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|12-06-2017|
The result that I'm looking for is:
|Machine name||Description||Project||Start date||Last date|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-03-2017|07-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_2|13-05-2016|13-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|22-05-2017|12-06-2017|
Does anybody have an idea how to create this result with a statement?
This is a classic gaps-and-islands problem, and the standard solutions will work just fine:
WITH Grouped_Run AS (SELECT name, description, project, test, executedOn,
ROW_NUMBER() OVER(ORDER BY executedOn) -
ROW_NUMBER() OVER(PARTITION BY name, description, project, test ORDER BY executedOn) AS groupingId
FROM Machine)
SELECT name, description, project, test, MIN(executedOn) as testStart
FROM Grouped_Run
GROUP BY name, description, project, test, groupingId
ORDER BY testStart
Fiddle example
(it's a little unclear if the group is going to be the whole row, but that's adjustable)
....will produce the results you're looking for.
Note that depending on what specific version you're on, there may be other/faster ways to achieve these results.
It seems like you're trying to get the first and last of "Start date". Write a GROUP BY query with MIN(Start date) and another with MAX(Start date) then union the results. You'll have to select DISTINCT or do another GROUP BY to eliminate the duplicates that will occur when there's only one date.

sql count all items that day until start of database isn't working because of time

I am trying to count each item in a database table, that is deployments. I have counted the total number of items 3879, by doing this:
use Bamboo_Version6
go
SELECT Count(STARTED_DATE)
FROM [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
But I have been struggling to get the number of items each day until the start. I have tried using some of the other similar answers to this like:
select STARTED_Date, count(deploymentID)
from [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
WHERE STARTED_Date>=dateadd(day,datediff(day,0,STARTED_Date)- 7,0)
GROUP BY STARTED_Date
But this will return every id, and a 1 beside it because the dates have times which are making it unique, so I tried doing this: CONVERT(varchar(12),STARTED_DATE,110) to try and fix the problem but it still happens. How can I count this without, getting all the id's or every id as 1 each time?
Remove the time component:
select cast(STARTED_Date as date) as dte, count(deploymentID)
from [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
group by cast(STARTED_Date as date)
order by dte;
I'm not sure what the WHERE clause is supposed to be doing, so I just removed it. If it is useful, add it back in.
I have another efficient way of doing this, may be try this with an over clause
SELECT cast(STARTED_DATE AS DATE) AS Deployment_date,
COUNT(deploymentID) OVER ( PARTITION BY cast(STARTED_DATE AS DATE) ORDER BY STARTED_DATE) AS NumberofDeployments
FROM [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]

Group by two columns is possible?

I have this table:
ID Price Time
0 20,00 20/10/10
1 20,00 20/10/10
2 20,00 12/12/10
3 14,00 23/01/12
4 87,00 30/07/14
4 20,00 30/07/14
I use this syntax sql to get the list of all prices in a way that does not get repeated values:
SELECT * FROM myTable WHERE id in (select min(id) from %# group by Price)
This code return me the values (20,14,87,20)
But in this case I would implement another check, that will not only sort by price but also by date, example: That syntax is getting the list by price, if I find a way to check by date, the code will return me the values (20,20,14,87,20)
He repeats 20 two times but if we see in the table we have three numbers 20 (two with the date 20/10/10 and one with the date 12/12/10) and is exactly what I'm wanting to get!
Somebody could help me?
To group by multiple columns, just put a comma in between the list.
SELECT price FROM myTable group by price, time order by time
The group by looks at all distinct combinations of the listed columns values, and discards duplicates. You can also use aggregate functions like sum or max to pull in additional columns to the results.
The following should work as long as all you need is the price/time combination. If you need to include the ID, things get more complicated:
SELECT `Price` FROM items
GROUP BY `Price`, `Time`
ORDER BY `Time`;
Here's a fiddle with the result in action: http://sqlfiddle.com/#!2/40821/1

Obtain maximum row_number inside a cross apply

I am having trouble in calculating the maximum of a row_number in my sql case.
I will explain it directly on the SQL Fiddle example, as I think it will be faster to understand: SQL Fiddle
Columns 'OrderNumber', 'HourMinute' and 'Code' are just to represent my table and hence, should not be relevant for coding purposes
Column 'DateOnly' contains the dates
Column 'Phone' contains the phones of my customers
Column 'Purchases' contains the number of times customers have bought in the last 12 months. Note that this value is provided for each date, so the 12 months time period is relative to the date we're evaluating.
Finally, the column I am trying to produce is the 'PREVIOUSPURCHASES' which counts the number of times the figure provided in the column 'Purchases' has appeared in the previous 12 months (for each phone).
You can see on the SQL Fiddle example what I have achieved so far. The column 'PREVIOUSPURCHASES' is producing what I want, however, it is also producing lower values (e.g. only the maximum one is the one I need).
For instance, you can see that rows 4 and 5 are duplicated, one with a 'PREVIOUSPURCHASES' of 1 and the other with 2. I don't want to have the 4th row, in this case.
I have though about replacing the row_number by something like max(row_number) but I haven't been able to produce it (already looked at similar posts at stackoverflow...).
This should be implemented in SQL Server 2012.
Thanks in advance.
I'm not sure what kind of result set you want to see but is there anything wrong with what's returned with this?
SELECT c.OrderNumber, c.DateOnly, c.HourMinute, c.Code, c.Phone, c.Purchases, MAX(o.PreviousPurchases)
FROM cte c CROSS APPLY (
SELECT t2.DateOnly, t2.Phone,t2.ordernumber, t2.Purchases, ROW_NUMBER() OVER(PARTITION BY c.DateOnly ORDER BY t2.DateOnly) AS PreviousPurchases
FROM CurrentCustomers_v2 t2
WHERE c.Phone = t2.Phone AND t2.purchases<=c.purchases AND DATEDIFF(DAY, t2.DateOnly, c.DateOnly) BETWEEN 0 AND 365
) o
WHERE c.OrderNumber = o.OrderNumber
GROUP BY c.OrderNumber, c.DateOnly, c.HourMinute, c.Code, c.Phone, c.Purchases
ORDER BY c.DateOnly