Lost trying to use DISTINCT and GROUP BY - sql

I'm having trouble with something that I thought would've been simple...
I have a simple model Statistic that stores a date (created_at), a user_fingerprint and a structure_id. From that, I'd like to create a graph to show #visitors per day.
So I did
#structure.statistics.order('DATE(created_at) ASC').group('DATE(created_at)').count
Which works and return what I expect:
=> {Sat, 18 May 2014=>50, Mon, 19 May 2014=>90}
Now I'd like the same, but I want to squeeze all rows with the same couple (created_at, user_fingerprint). For instance:
| created_at | user_fingerprint | structure_id |
|----------------------|------------------|--------------|
| Sat, 18 May 2014 2PM | '124512341' | 12 |
| Sat, 18 May 2014 4PM | '124512341' | 12 |
| Mon, 19 May 2014 6PM | '124512341' | 12 |
With this data, I would have:
=> {Sat, 18 May 2014=>1, Mon, 19 May 2014=>1}
# instead of
=> {Sat, 18 May 2014=>2, Mon, 19 May 2014=>1}
I would be able to do it in Ruby but I wondered if I could directly do it with SQL & Arel.
Solution regarding your answers
Here is what I did at the end:
#impressions = {}
# The following is to ensure I will have a key when there is no stat for a day.
(15.days.ago.to_date..Date.today).each { |date| #impressions[date] = 0 }
#structure.statistics.where( Statistic.arel_table[:created_at].gt(Date.today - 15.days) )
.order('DATE(created_at) ASC')
.group('DATE(created_at)')
.select('DATE(created_at) as created_at, COUNT(DISTINCT(user_fingerprint)) as user_count')
.each{ |stat| #impressions[stat.created_at] = stat.user_count }
I need to do a bit of Ruby though but that's good for me.

your query would look something like (Oracle dialect)
select trunc(created_at), user_fingerprint, count(distinct user_fingerprint)
from statistic
group by trunc(created_at), user_fingerprint
there is no SQL standard for getting date portion out of datetime data field.
oracle: trunc(dt_column)
sql server: cast(dt_column As Date)
mysql: DATE(dt_column)

#structure.statistics.order('DATE(created_at) ASC').group('DATE(created_at)').select('count(distinct(user_fingerprint)) as user_count').first.user_count

Related

how to group dates from ms access database as week of month using excel vba

I am using MS access 2010 database and working with Excel VBA to connect to the database and make queries. Suppose I have a table named "MyTable" like this below:
----------------------
| Date | Count |
----------------------
|7/7/16 | 12 |
----------------------
|7/8/16 | 15 |
----------------------
|7/15/16 | 18 |
----------------------
|7/18/16 | 16 |
----------------------
|8/7/16 | 15 |
----------------------
|8/8/16 | 10 |
----------------------
|8/15/16 | 9 |
----------------------
|8/16/16 | 18 |
----------------------
Now I want to use query to get a table like this:
----------------------
|Week by Month | Sum |
----------------------
|July Week 2 | 27 |
----------------------
|July Week 3 | 18 |
----------------------
|July Week 4 | 16 |
----------------------
|Aug Week 2 | 25 |
----------------------
|Aug Week 3 | 27 |
----------------------
Use DatePart to get the week of the year, then subtract the week of the first day of the month (zero based week of the month) and then add 1 (to get to a one based week of the month:
Public Function WeekOfMonth(x As Date) As Integer
WeekOfMonth = DatePart("ww", x) - _
DatePart("ww", DateSerial(Year(x), Month(x), 1)) _
+ 1
End Function
Note that the Access SQL version should be idential to what's after the = sign.
I have solved this as below:
select weeknum, sum(count1) from (
select format(date1,'MMM') & " Week - " & int((datepart('d',date1,1,1) -1 ) / 7 + 1) as weeknum, count1 from MyTable)
group by weeknum
Show Week of Month where Week 1 is always the 1st Full Week of the Month starting in that month (First Sunday is 1 or 2 or 3 or 4 or 5 or 6 or 7), days of the month prior to the first Sunday are counted as week 4/5 of previous month.
After searching and failing to find EXACTLY the right answer for my situation - I modified ComIntern's solution as follows. This is used a CONTROL on a REPORT, where [StartDate] is a criteria on the form that calls/generates the report:
=IIf((DatePart("ww",[StartDate]-7)-DatePart("ww",DateSerial(Year([StartDate]-7),Month([StartDate]-7),1))+1)="5","1",DatePart("ww",[StartDate])-DatePart("ww",DateSerial(Year([StartDate]),Month([StartDate]),1))+0)
This results in showing the Week of Month based on FULL weeks - and accounts for when the previous month's week 5 included 1 or more days from this month.
For example - Week 5 of Oct 2017 is 29 OCT - 04 NOV. If I did not include the IIF statement to adjust the formula, 05-11 NOV is returned as Week 2, but for my reporting purposes it is Week 1 of NOV. I have tested this out and appears to ALWAYS work, if you need to see Week of Month, based on FULL weeks, this should work for you!

Oracle SQL Special Period Format

I have a special fiscal period in format YYYYMMM, for example
Feb of 2015 is 2015002
Nov of 2014 is 2014011
I need to do subtraction from the period, 2 months ago from 2015002 is 2014012, but i cant do like
SELECT '2015001' - 2 FROM DUAL
How can i do that?
You should first convert it to a date, then subtract months and convert back to the format you need.
with x(y) as (
select '2015002' from dual
)
select y,
to_date(y,'YYYY"0"MM'),
add_months(to_date(y,'YYYY"0"MM'),-2),
to_char(add_months(to_date(y,'YYYY"0"MM'),-2),'YYYY"0"MM')
from x
Results:
| Y | TO_DATE(Y,'YYYY"0"MM') | ADD_MONTHS(TO_DATE(Y,'YYYY"0"MM'),-2) | TO_CHAR(ADD_MONTHS(TO_DATE(Y,'YYYY"0"MM'),-2),'YYYY"0"MM') |
|---------|----------------------------|---------------------------------------|------------------------------------------------------------|
| 2015002 | February, 01 2015 00:00:00 | December, 01 2014 00:00:00 | 2014012 |

SQL Query to Group Clusters of Related Events with Criteria

OK, this is tough to explain without drawing it out on a whiteboard or something... But here it goes. I've tried to be as clear as possible but let me know if this doesn't make sense....
I have a MS Access project that processes time series datasets from multiple Source Objects or "SOURCES", and multiple observation points or "RECEIVERS", and identifies events of interest based on time and spatial proximity. This gives me a table of triggers of possibly related events with the following fields.
CORRELATION_ID
RECEIVER_EVENT_ID
RECEIVER_NAME
RECEIVER_START_DATETIME
RECEIVER_END_DATETIME
SOURCE_EVENT_ID
SOURCE_NAME
SOURCE_START_DATETIME
SOURCE_END_DATETIME
Because I can get multiple source and receiver triggers happening at overlapping times, or times that are close to each other, I get a massive list of triggers and I would like to refine this list of triggers by grouping them further based on additional criteria.
I would like to specify 2 criteria for max allowable time gap between source events, MAX_SOURCE_GAP, and maximum allowable gap between receiver events MAX_RECEIVER_GAP. GAP is calculated from Start time of one trigger minus the end time of another trigger.
If the events are within this gap range then they need to be grouped, and the resulting group record must store the start time of the earliest event and the end time of the latest event. For the RECEIVER events, the RECEIVER_NAME must be the same. (ie I dont want to group events from different RECEIVERS because I still want to end up with a list of related RECEIVER<>SOURCE events) For the SOURCE events, the event must have been picked up by the same Receiver, in otherwords the RECEIVER_NAME again must be the same. I would also like the record to return a list of the names of either the Sources that are grouped. For this I was thinking I could implement Allen Browne's ConcatRelated() function.
updated The 3rd criteria required defines the relationship between the grouped source events and the grouped receiver events, MAX_SOURCE_TO_RECEIVER_DELAY. This is the maximum allowable time delay after start time of a source that the receiver can be triggered. In otherwords startTime_receiver - startTime_source <= MAX_SOURCE_TO_RECEIVER_DELAY. The receiver can also not trigger before the source, so startTime_receiver < startTime_source.
I think basically this will require a few steps. At least one subquery to group the SOURCE events. At least one subquery to group the RECEIVER events. And then a step to combine them so I can return something like this.
RECEIVER_NAME
MIN-RECEIVER_START_DATETIME
MAX-RECEIVER_END_DATETIME
MIN-SOURCE_END_DATETIME
MAX-SOURCE_END_DATETIME
LIST_OF_SOURCES <--field that looks like "SOURCE10, SOURCE 24, SOURCE 51" generated from Allen Browne's ConcatRelated() function.
I think I understand the methodology but I am having trouble properly grouping things where there are more than 2 triggers. I can probably tackle concatenating the names of the sources with ConcatRelated if I get the proper time grouping figured out.
--Update -
I have uploaded some sample data to SQLfiddle.com click here for sample dataset
The resulting table I am essentially trying to come up with would look like this for this sample data set:
RECEIVER_NAME MIN-RECEIVER_START_DATETIME MAX-RECEIVER_END_DATETIME SOURCE_LIST MIN-SOURCE_START_DATETIME MAX-SOURCE_END_DATETIME
RECEIVER1 2012-04-08 05:08 2012-04-08 06:22 SOURCE1,SOURCE2,SOURCE3 2012-04-08 02:10 2012-04-08 05:25
RECEIVER2 2012-05-08 10:05 2012-04-08 14:55 SOURCE1,SOURCE2 2012-05-08 10:01 2012-05-08 13:45
RECEIVER2 2012-06-08 06:55 2012-06-08 21:19 SOURCE2 2012-05-08 14:55 2012-05-08 16:22
sorry, wow what a pain trying to post a table. I couldn't find any better way.
as i mentioned in my comment no criteria has been used to yield the result. your events are grouped by RECEIVER_EVENT_ID and RECEIVER_EVENT_START_TIME. ( i guess receiver_event_id is always related to receiver_name hence i chose event_id but you can also group by receiver_name its up to you)
this will give for now:
240 | RECEIVER1 | August, 04 2012 05:08:00+0000
241 | RECEIVER2 | August, 05 2012 10:05:00+0000
242 | RECEIVER2 | August, 05 2012 14:15:00+0000
243 | RECEIVER2 | August, 06 2012 06:55:00+0000
then you can find the min and max values related to the events that are being grouped.
if you would like to group the events 241 & 242, you need to find a logic that group them both together.
here is the code markdown for grouping events and event start time:
Hope this gives you an idea about group_concatenation function in MySQL as well as grouping. Let me know if you have found the exact SQL statement for you question or a faster solution. I'm very much interested to see that too.
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE relatedEvents
(
CORRELATION_ID INT auto_increment primary key,
RECEIVER_EVENT_ID INT,
RECEIVER_NAME VARCHAR(20),
RECEIVER_START_DATETIME DATETIME,
RECEIVER_END_DATETIME DATETIME,
SOURCE_EVENT_ID INT,
SOURCE_NAME VARCHAR(20),
SOURCE_START_DATETIME DATETIME,
SOURCE_END_DATETIME DATETIME
);
INSERT INTO relatedEvents
(RECEIVER_EVENT_ID, RECEIVER_NAME, RECEIVER_START_DATETIME,
RECEIVER_END_DATETIME, SOURCE_EVENT_ID, SOURCE_NAME, SOURCE_START_DATETIME, SOURCE_END_DATETIME)
VALUES
('240', 'RECEIVER1', '2012-08-04 05:08:00', '2012-08-04 06:22', '1', 'SOURCE1', '2012-08-04 02:10', '2012-08-04 02:40'),
('240', 'RECEIVER1', '2012-08-04 05:08:00', '2012-08-04 06:22', '2', 'SOURCE2', '2012-08-04 02:30', '2012-08-04 03:10'),
('240', 'RECEIVER1', '2012-08-04 05:08:00', '2012-08-04 06:22', '3', 'SOURCE2', '2012-08-04 03:15', '2012-08-04 03:30'),
('240', 'RECEIVER1', '2012-08-04 05:08:00', '2012-08-04 06:22', '4', 'SOURCE3', '2012-08-04 05:01', '2012-08-04 05:25'),
('241', 'RECEIVER2', '2012-08-05 10:05:00', '2012-08-05 10:35', '5', 'SOURCE1', '2012-08-05 10:01', '2012-08-05 10:15'),
('241', 'RECEIVER2', '2012-08-05 10:05:00', '2012-08-05 10:35', '6', 'SOURCE2', '2012-08-05 12:15', '2012-08-05 12:17'),
('242', 'RECEIVER2', '2012-08-05 14:15:00', '2012-08-05 14:55', '7', 'SOURCE1', '2012-08-05 13:35', '2012-08-05 13:45'),
('243', 'RECEIVER2', '2012-08-06 06:55:00', '2012-08-06 21:19', '8', 'SOURCE2', '2012-08-05 14:55', '2012-08-05 16:22');
Query 1:
SELECT
RECEIVER_EVENT_ID as EVENT_ID,
o_r.receiver_name as Receiver_name,
(select min(RECEIVER_START_DATETIME)from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID)as min_r_st,
(select Max(RECEIVER_END_DATETIME)from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID) as max_r_et,
(Select group_concat(DISTINCT source_name) from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID) as Sources,
(select min(SOURCE_START_DATETIME)from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID)as min_s_st,
(select Max(SOURCE_END_DATETIME)from relatedEvents as I_R where ((I_R.RECEIVER_EVENT_ID = O_R.RECEIVER_EVENT_ID)) Group by I_R.RECEIVER_EVENT_ID) as max_s_et,
count(RECEIVER_START_DATETIME) as RST
FROM relatedEvents as O_R
group by RECEIVER_EVENT_ID, RECEIVER_START_DATETIME
order by RECEIVER_START_DATETIME asc
Results:
| EVENT_ID | RECEIVER_NAME | MIN_R_ST | MAX_R_ET | SOURCES | MIN_S_ST | MAX_S_ET | RST |
|----------|---------------|-------------------------------|-------------------------------|-------------------------|-------------------------------|-------------------------------|-----|
| 240 | RECEIVER1 | August, 04 2012 05:08:00+0000 | August, 04 2012 06:22:00+0000 | SOURCE1,SOURCE2,SOURCE3 | August, 04 2012 02:10:00+0000 | August, 04 2012 05:25:00+0000 | 4 |
| 241 | RECEIVER2 | August, 05 2012 10:05:00+0000 | August, 05 2012 10:35:00+0000 | SOURCE1,SOURCE2 | August, 05 2012 10:01:00+0000 | August, 05 2012 12:17:00+0000 | 2 |
| 242 | RECEIVER2 | August, 05 2012 14:15:00+0000 | August, 05 2012 14:55:00+0000 | SOURCE1 | August, 05 2012 13:35:00+0000 | August, 05 2012 13:45:00+0000 | 1 |
| 243 | RECEIVER2 | August, 06 2012 06:55:00+0000 | August, 06 2012 21:19:00+0000 | SOURCE2 | August, 05 2012 14:55:00+0000 | August, 05 2012 16:22:00+0000 | 1 |

Counting columns in the same table

I've been trying to develop a cashflow statement in access 2007. This would be so easily done in excel using formulas such as
= SUM (B6:M6) / CountIF(B6:M6)>0
but I cant seem to wrap my head around this when it comes to access. And I need this for every company we enter data on. The cashflow statement is supposed to look like this (Since I can't yet post a pic):
----------------------------------------------------------------------------------------------------------------
Particulars | Jan | Feb | Mar | Apr | Jun | Jul | Aug | Sep | Oct | Nov | Dec | Average |
Sales---------------------->
Salary------>
Transportation----->
and about 10 other items in the row, all with entries for Jan till Dec, however, sometimes we take 6 months worth of data and sometimes for all 12 months. (Imagine a basic excel sheet with items on the first column and headers for the next 12-13 columns).
In access, I made tables for each Item with columns as the months, eg. tblRcpt--> |rcpt_ID|Jan|Feb|... and so on till dec for all the items. Then they will be arranged and presented in an entry form which would be designed to look similar to the above table while later I would query and link them together to presentthe complete cashflow statement.
Now comes the question, I need to Average together the columns (as you can see in the right most column), BUT I only want to average together those months that have been filled (Sometimes in accounting people enter '0' where there is no data), so I cant just sum the columns and divide by twelve. It has to be dynamic, all functions seem to center around counting and averaging ROWs, not COLUMNs.
Thanks for just bearing with me and reading this, any help would be much appreciated.
Try this
(Jan + Feb + ... + Dec) /
( case when Jan = 0 then 0 else 1 end
+ case when Feb = 0 then 0 else 1 end
+ case when Dec = 0 then 0 else 1 end )
as Avg
Your table structure should be:
Particulars | Month | Amount
Sales 1 500
Sales 2 1000
Salary 1 80000
...and so on. You can either not enter rows when you don't have a value for that month, or you can handle them in the SQL statement (as I have below):
SELECT Particulars, AVG(Amount) AverageAmount
FROM MyTable
WHERE NULLIF(Amount, 0) IS NOT NULL
GROUP BY Particulars;

Columns to Rows in SQL Server

I have a query returning a table which looks like:
Location | November | December | January | February | March | ... | October |
CT 30 70 80 90 60 30
etc.
and I'd like it to look like:
Location | Month | Value |
CT November 30
CT December 70
CT January 80
...
CT October 30
It looks like an unpivot, but I didn't pivot to get it into that form since the base table has the months as columns (the values are just sums of values grouped by location). I've seen plenty of rows-to-columns questions but I haven't found a good columns-to-rows answer, so I'm hoping someone can help me.
Thanks!
You will want to use the UNPIVOT function. This transforms the values of your columns and turns it into rows:
select location, month, value
from <yourquery here>
unpivot
(
value
for month in (January, February, March, April, May, June, July, August,
September, October, November, December)
) unpiv
See SQL Fiddle with Demo