MySQL DATETIME group - sql

I want to group records in hour or day.
The table A looks like:
The table A has two columns: ID int, record_time datetime,
For example, two records looks like:
id record_time
-----------------------
1 2011-01-24 22:14:50
2 2011-01-24 22:14:50
I want to group by hour. I use command:
select *
from A
group by Hour(record_time);
However, it does not output as I want.
It only outputs the first record. The second record does not show.

What you call grouping sounds like it's actually sorting. Change group by to order by and see if that gets you what you want. If by "group" you actually mean "I want to group the rows together in the result set, this is what you need (and is called ordering).

SELECT *
FROM A
GROUP BY DATE_FORMAT(record_time, '%H')
UPDATE
SELECT *
FROM A
ORDER BY DATE_FORMAT(record_time, '%H%Y%m%d')
Try this... just ORDER BY (no grouping)

you should use date_format, like
group by date_format(record_time, '%Y%m%d%H');

Related

In Snowflake, I want to count duplicates in a table based on all the columns in the table without typing out every column name

I have a table with 60 columns in it. I would like to identify how many duplicates there are in the table based on all the columns being identical.
I don't want to have to type out every field name in the SELECT or GROUP BY clauses. Is there a way to do that?
You can use an approach like this for each table:
SELECT
MD5(OBJECT_CONSTRUCT(SRC.*)::VARCHAR) DUP_MD5, SUM(1) AS TOTAL_COUNT
FROM <table> SRC
GROUP BY 1
HAVING SUM(1) > 1;

SQL - Select duplicates based on two columns in DB2

I am using DB2 and am trying to count duplicate rows in a table called ML_MEASURE. What I define as a duplicate in this table, is a row containing the same DATETIME and TAG_NAME value. So I tried this below:
SELECT
DATETIME,
TAG_NAME,
COUNT(*) AS DUPLICATES
FROM
ML_MEASURE
GROUP BY DATETIME, TAG_NAME
HAVING COUNT(*) > 1
The query doesn't fail, but I get an empty result, even though I now for a fact I have at least one duplicate, when I tried this query below I got the result correct for this specific tag_name and datetime:
SELECT
DATETIME,
TAG_NAME,
COUNT(*) AS DUPLICATES
FROM
ML_MEASURE
WHERE
DATETIME='2018-03-23 15:09:30' AND
TAG_NAME='HOG.613KU201'
GROUP BY
DATETIME,
TAG_NAME.
The result of the second query looked like this:
DATETIME TAG_NAME DUPLICATES
--------------------- ------------ ----------
2018-03-23 15:09:30.0 HOG.613KU201 3
What am I doing wrong in the first query?
* UPDATE *
My table is row organized, not sure if that makes any difference.
Yes, you should get the same row back on the first query. If you had a NOT ENFORCED TRUSTED Primary Key or Unique constraint on those two columns, then the Optimizer would be within it's rights to trust the constraint and return you no rows. However from a quick test, I don't believe it does that for this query.
Do you have any indexes defined on the table?
(P.S. I assume you are not running the query from a shell prompt and redirecting the output to a file of the name 1)
This worked for me:
SELECT * FROM (
SELECT DATETIME, TAG_NAME, COUNT(*) AS DUPLICATES
FROM ML_MEASURE
GROUP BY DATETIME, TAG_NAME
) WHERE DUPLICATES > 1

SQL Server Sum multiple rows into one - no temp table

I would like to see a most concise way to do what is outlined in this SO question: Sum values from multiple rows into one row
that is, combine multiple rows while summing a column.
But how to then delete the duplicates. In other words I have data like this:
Person Value
--------------
1 10
1 20
2 15
And I want to sum the values for any duplicates (on the Person col) into a single row and get rid of the other duplicates on the Person value. So my output would be:
Person Value
-------------
1 30
2 15
And I would like to do this without using a temp table. I think that I'll need to use OVER PARTITION BY but just not sure. Just trying to challenge myself in not doing it the temp table way. Working with SQL Server 2008 R2
Simply put, give me a concise stmt getting from my input to my output in the same table. So if my table name is People if I do a select * from People on it before the operation that I am asking in this question I get the first set above and then when I do a select * from People after the operation, I get the second set of data above.
Not sure why not using Temp table but here's one way to avoid it (tho imho this is an overkill):
UPDATE MyTable SET VALUE = (SELECT SUM(Value) FROM MyTable MT WHERE MT.Person = MyTable.Person);
WITH DUP_TABLE AS
(SELECT ROW_NUMBER()
OVER (PARTITION BY Person ORDER BY Person) As ROW_NO
FROM MyTable)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
First query updates every duplicate person to the summary value. Second query removes duplicate persons.
Demo: http://sqlfiddle.com/#!3/db7aa/11
All you're asking for is a simple SUM() aggregate function and a GROUP BY
SELECT Person, SUM(Value)
FROM myTable
GROUP BY Person
The SUM() by itself would sum up the values in a column, but when you add a secondary column and GROUP BY it, SQL will show distinct values from the secondary column and perform the aggregate function by those distinct categories.

Effective way to separate a group into individual records

I'm grouping some records by their proximity of time. What I do right now (timestamps in unixtime),
First off I do a sub select to grab records that are of interest of me,
(SELECT timestamp AS target_time FROM table WHERE something = cool) AS subselect
Then I want to look at the records that are close in time to those,
SELECT id FROM table, subselect WHERE ABS(target_time - timestamp) < 1800
But here is where I hit my problem. I want to only want the records where the time diffrance between the records around the target_time is > 20 mins. So to do this, I group by the target_time and add a HAVING section.
SELECT id FROM table, first WHERE ABS(target_time - timestamp) < 3600
GROUP BY target_time HAVING MAX(timestamp) - MIN(timestamp) > 1200
This is great, and all the records I don't like are gone, but now I only have the first id of the group, when I really want all of the ids. I can use GROUP_CONCAT but that gives me a be mess I can't do anymore queries on. What I really would like it to get all of the ids returned from all of these groups that are created. Do I need another SELECT statement? Or is there just a better way to structure what I got?
Thank you,
A SQL nub.
See if I have your problem correct:
For a given row in a table, you want to know the set of rows for similar records if the range of timestamps for those records is greater than 20 minutes. You want to to this for all ids in the table.
If you simply want a list of ids which fulfil this criteria, it is fairly straightforward:
given a table like:
create table foo (id bigint(4), section VARCHAR(2), modification datetime);
you can do:
select id, foo.section, min_max.min_modification, min_max.max_modification, abs(min_max.min_modification - min_max.max_modification) as diff
from foo,
(select section, max(modification) max_modification, min(modification) min_modification from foo as inner_foo group by section) as min_max
where foo.section = min_max.section
and abs(min_max.min_modification - min_max.max_modification) > 1800;
You're doing a subselect based on the 'similar rows' criteria (in this case the column section) to get the minimum and maximum timestamps for that section. This min and max applies to all ids in that section. Hence, for section 'A', you will have a list of ids, same for section 'B'.
My assumption is you want an output that looks like:
id1, timestamp1, fieldA, fieldB
id1, timestamp2, fieldA, fieldB
id2, timestamp3, fieldA, fieldB
id2, timestamp4, fieldA, fieldB
id3, timestamp5, fieldA, fieldB
id3, timestamp6, fieldA, fieldB
but the timestamp for these records is BETWEEN 1200 and 1800 seconds of a "target_time" where something = cool?
SELECT data.id, data.timestamp, data.fieldA, data.fieldB, ..., data.fieldX
FROM events
JOIN data
WHERE events.something = cool_event -- Gives the 'target_time' of cool_event
AND ABS(event.timestamp - data.timestamp) BETWEEN 1200 and 1800 -- gives data records 'near' target time, but at least 20 minutes away.
IF the 'data' and 'events' table are the SAME table, then just use table alias names, but you can join a table to itself, aka 'SELF-JOIN'.
SELECT data.id, data.timestamp, data.fieldA, data.fieldB, ..., data.fieldX
FROM events AS target, events AS data
WHERE target.something = cool_event -- gives the 'target_time' of cool_event
AND ABS(target.timestamp - data.timestamp) BETWEEN 1200 and 1800 -- gives data records 'near' target time, but at least 20 minutes away.
This sounds about right, and is without any group-by or aggregates needed.
You can order the resulting data if necessary.
-- J Jorgenson --

How to select 10 rows below the result returned by the SQL query?

Here is the SQL table:
KEY | NAME | VALUE
---------------------
13b | Jeffrey | 23.5
F48 | Jonas | 18.2
2G8 | Debby | 21.1
Now, if I type:
SELECT *
FROM table
WHERE VALUE = 23.5
I will get the first row.
What I need to accomplish is to get the first and the next two rows below. Is there a way to do it?
Columns are not sorted and WHERE condition doesn't participate in the selection of the rows, except for the first one. I just need the two additional rows below the returned one - the ones that were entered after the one which has been returned by the SELECT query.
Without a date column or an auto-increment column, you can't reliably determine the order the records were entered.
The physical order with which rows are stored in the table is non-deterministic.
You need to define an order to the results to do this. There is no guaranteed order to the data otherwise.
If by "the next 2 rows after" you mean "the next 2 records that were inserted into the table AFTER that particular row", you will need to use an auto incrementing field or a "date create" timestamp field to do this.
If each row has an ID column that is unique and auto incrementing, you could do something like:
SELECT * FROM table WHERE id > (SELECT id FROM table WHERE value = 23.5)
If I understand correctly, you're looking for something like:
SELECT * FROM table WHERE value <> 23.5
You can obviously write a program to do that but i am assuming you want a query. What about using a Union. You would also have to create a new column called value_id or something in those lines which is incremented sequentially (probably use a sequence). The idea is that value_id will be incremented for every insert and using that you can write a where clause to return the remaining two values you want.
For example:
Select * from table where value = 23.5
Union
Select * from table where value_id > 2 limit 2;
Limit 2 because you already got the first value in the first query
You need an order if you want to be able to think in terms of "before" and "after".
Assuming you have one you can use ROW_NUMBER() (see more here http://msdn.microsoft.com/en-us/library/ms186734.aspx) and do something like:
With MyTable
(select row_number() over (order by key) as n, key, name, value
from table)
select key, name, value
from MyTable
where n >= (select n from MyTable where value = 23.5)