How do I declare a column name that changes?
I take some data from DB and I am interested in last 12 months, so I only take events that happend, let's say in '2016-07', '2016-06' and so on...
Then, I want my table to look like this:
event type | 2016-07 | 2016-06
-------------------------------
A | 12 | 13
B | 21 | 44
C | 98 | 12
How can I achieve this effect that the columns are named using previous YYYY-MM pattern, keeping in mind that the report with that query can be executed any time, so it would change.
Simplified query only for prev month:
select distinct
count(event),
date_year_month,
event_name
from
data_base
where date_year_month = TO_CHAR(add_months(current_date, -1),'YYYY-MM')
group by event_name, date_year_month
I don't think there is an automated way of pivoting the year-month columns, and change the number of columns in the result dynamically based on the data.
However if you are looking for pivoting solution, you accomplish using table functions in netezza.
select event_name, year_month, event_count
from event_counts_groupby_year_month, table(inza.inza.nzlua('
local rows={}
function processRow(y2016m06, y2016m07)
rows[1] = { 201606, y2016m06 }
rows[2] = { 201607, y2016m07 }
return rows
end
function getShape()
columns={}
columns[1] = { "year_month", integer }
columns[2] = { "event_count", double }
return columns
end',
y2016m06, y2016m07));
you could probably build a wrapper on this to dynamically generate the query based on the year month present in the table using shell script.
Related
I have a Table in MS Access like this:
The Columns are:
--------------------------------------
| *Date* | *Article* | *Distance* | Value |
---------------------------------------
Date, Article and Distance are Primary Keys, so the combination of them is always unique.
The column Distance has discrete values from 0 to 27.
I need to transform this table into a table like this:
----------
| *Date* | *Article* | Value from Distance 0| Value Dis. 1|...|Value Dis. 27|
----------
I really don't know a SQL Statement for this task. I needed a really fast solution which is why I wrote an Excel macro which worked fine but was very inefficient and needed several hours to complete. Now that the amount of data is 10 times higher, I can't use this macro anymore.
You can try the following pivot query:
SELECT
Date,
Article,
MAX(IIF(Distance = 0, Value, NULL)) AS val_0,
MAX(IIF(Distance = 1, Value, NULL)) AS val_1,
...
MAX(IIF(Distance = 27, Value, NULL)) AS val_27
FROM yourTable
GROUP BY
Date,
Article
Note that Access does not support CASE expressions, but it does offer a function called IIF() which takes the form of:
IIF(condition, value if true, value if false)
which essentially behaves the same way as CASE in other RDBMS.
I have a table A within a dataset in Bigquery. This table has multiple columns and one of the columns called hits_eventInfo_eventLabel has values like below:
{ID:AEEMEO,Score:8.990000;ID:SEAMCV,Score:8.990000;ID:HBLION;Property
ID:DNSEAWH,Score:0.391670;ID:CP1853;ID:HI2367;ID:H25600;}
If you write this string out in a tabular form, it contains the following data:
**ID | Score**
AEEMEO | 8.990000
SEAMCV | 8.990000
HBLION | -
DNSEAWH | 0.391670
CP1853 | -
HI2367 | -
H25600 | -
Some IDs have scores, some don't. I have multiple records with similar strings populated under the column hits_eventInfo_eventLabel within the table.
My question is how can I parse this string successfully WITHIN BIGQUERY so that I can get a list of property ids and their respective recommendation scores (if existing)? I would like to have the order in which the IDs appear in the string to be preserved after parsing this data.
Would really appreciate any info on this. Thanks in advance!
I would use combination of SPLIT to separate into different rows and REGEXP_EXTRACT to separate into different columns, i.e.
select
regexp_extract(x, r'ID:([^,]*)') as id,
regexp_extract(x, r'Score:([\d\.]*)') score from (
select split(x, ';') x from (
select 'ID:AEEMEO,Score:8.990000;ID:SEAMCV,Score:8.990000;ID:HBLION;Property ID:DNSEAWH,Score:0.391670;ID:CP1853;ID:HI2367;ID:H25600;' as x))
It produces the following result:
Row id score
1 AEEMEO 8.990000
2 SEAMCV 8.990000
3 HBLION null
4 DNSEAWH 0.391670
5 CP1853 null
6 HI2367 null
7 H25600 null
You can write your own JavaScript functions in BigQuery to get exactly what you want now: http://googledevelopers.blogspot.com/2015/08/breaking-sql-barrier-google-bigquery.html
I'm creating a campaign event scheduler that allows for frequencies such as "Every Monday", "May 6th through 10th", "Every day except Sunday", etc.
I've come up with a solution that I believe will work fine (not yet implemented), however, it uses "LIKE" in the queries, which I've never been too fond of. If anyone else has a suggestion that can achieve the same result with a cleaner method, please suggest it!
+----------------------+
| Campaign Table |
+----------------------+
| id:int |
| event_id:foreign_key |
| start_at:datetime |
| end_at:datetime |
+----------------------+
+-----------------------------+
| Event Table |
+-----------------------------+
| id:int |
| valid_days_of_week:string | < * = ALL. 345 = Tue, Wed, Thur. etc.
| valid_weeks_of_month:string | < * = ALL. 25 = 2nd and 5th weeks of a month.
| valid_day_numbers:string | < * = ALL. L = last. 2,7,17,29 = 2nd day, 7th, 17th, 29th,. etc.
+-----------------------------+
A sample event schedule would look like this:
valid_days_of_week = '1357' (Sun, Tue, Thu, Sat)
valid_weeks_of_month = '*' (All weeks)
valid_day_numbers = ',1,2,5,6,8,9,25,30,'
Using today's date (6/25/15) as an example, we have the following information to query with:
Day of week: 5 (Thursday)
Week of month: 4 (4th week in June)
Day number: 25
Therefore, to fetch all of the events for today, the query would look something like this:
SELECT c.*
FROM campaigns AS c,
LEFT JOIN events AS e
ON c.event_id = e.id
WHERE
( e.valid_days_of_week = '*' OR e.valid_days_of_week LIKE '%5%' )
AND ( e.valid_weeks_of_month = '*' OR e.valid_weeks_of_month LIKE '%4%' )
AND ( e.valid_day_numbers = '*' OR e.valid_day_numbers LIKE '%,25,%' )
That (untested) query would ideally return the example event above. The "LIKE" queries are what have me worried. I want these queries to be fast.
By the way, I'm using PostgreSQL
Looking forward to excellent replies!
Use arrays:
CREATE TABLE events (id INT NOT NULL, dow INT[], wom INT[], dn INT[]);
CREATE INDEX ix_events_dow ON events USING GIST(dow);
CREATE INDEX ix_events_wom ON events USING GIST(wom);
CREATE INDEX ix_events_dn ON events USING GIST(dn);
INSERT
INTO events
VALUES (1, '{1,3,5,7}', '{0}', '{1,2,5,6,8,9,25,30}'); -- 0 means any
, then query:
SELECT *
FROM events
WHERE dow && '{0, 5}'::INT[]
AND wom && '{0, 4}'::INT[]
AND dn && '{0, 26}'::INT[]
This will allow using the indexes to filter the data.
I have a table with the following structure
|user_id | place | type_of_place | money_earned| time |
|--------+-------+---------------+-------------+------|
| | | | | |
The table is very large, several millions of rows. The data is in a PostgreSQL 9.1 database.
I want to calculate, per user_id and type_of_place: the mean, the standard deviation, and the top 5 of places (ordered by counts), and the most used hour of time (mode).
The resulting data must be in this form:
| user_id | type_of_place | avg | stddev | top5_places | mode |
+---------+---------------+-----+--------+------------------+------+
| 1 | tp1 | 10 | 1 | {p1,p2,p3,p4,p5} | 8 |
| 2 | tp1 | 3 | 2 | {p3,p4} | 23 |
| 1 | tp3 | 1 | 1 | {p1} | 4 |
etc.
Is there a for of doing this with window functions efficiently?
What if I want to grouping by week? (i.e. another column that represents the number of week)
Thank you!
A standard GROUP BY query will get you most of the way:
SELECT
user_id,
type_of_place,
avg(money_earned) AS avg,
stddev(money_earned) AS stddev
FROM
earnings -- I'm not sure what your data table is called...
GROUP BY
user_id,
type_of_place
This leaves the top5_places and mode columns. These are both also aggregates, but not ones which are defined in the standard PostgreSQL installation. Luckily, you can add them.
Here's a page discussing how to define a mode aggregate function: http://wiki.postgresql.org/wiki/Aggregate_Mode
Once you have a mode aggregate function, assuming time is a timestamp of some kind, the expression you will add to the select list will be:
SELECT
...
mode(extract(hour FROM time)) AS mode -- Add this expression
FROM
...
Assuming order by money
For top5_places, there are several approaches, but the quickest is probably to use PostgreSQL's builtin array_agg function, and take the first 5 elements:
SELECT
...
(array_agg(place ORDER BY money_earned DESC))[1:5] AS top5_places -- Add this expression
FROM
...
One alternative is to define another aggregate called (for instance) top5, which performs the same function. This could be more efficient if there are many distinct places for each user/type of place combination, since it can stop accumulating after the first 5, whereas the above expression will generally build a complete array of all places, and then truncate to the first 5.
This assumes that a place has a unique earnings entry for each user/type combination. If a place can occur more than once, and you want to sort by sum(money_earned) for each place, then you need to use a subquery like in the examples below...
Order by counts
Ok, so the places should be ordered by how often they occur. Here's a quick way, which uses a couple of subqueries -- add this as an expression to the select-clause of the above query:
(SELECT
(array_agg(place ORDER BY cnt DESC))[1:5]
FROM
(SELECT place, count(*) FROM earnings AS t2
WHERE t2.user_id = earnings.user_id AND t2.type_of_place = earnings.type_of_place
GROUP BY place) AS s (place, cnt)
) AS top5_places
The inner subquery called s evaluates to a table of each place for that user/type combination, and the number of times it occurs (which I've called cnt). These are then fed to array_agg in descending order of that count.
I suspect there could be much neater (and probably more efficient) ways of writing it. If not, then I would recommend trying to move this complicated expression into a function or aggregate, if you can...
Histrogram of places in each hour
We'll use a similar expression, which will return the array of counts, ordered by hour:
(SELECT
array_agg(cnt ORDER BY hour DESC)
FROM
(SELECT extract(hour FROM time), count(*) FROM earnings AS t2
WHERE t2.user_id = earnings.user_id AND t2.type_of_place = earnings.type_of_place
GROUP BY 1) AS s (hour, cnt)
) AS hourly_histogram
(Add that to the select-clause of the original query.)
First, I've been using mysql for forever and am now upgrading to postgresql. The sql syntax is much stricter and some behavior different, thus my question.
I've been searching around for how to merge rows in a postgresql query on a table such as
id | name | amount
0 | foo | 12
1 | bar | 10
2 | bar | 13
3 | foo | 20
and get
name | amount
foo | 32
bar | 23
The closest I've found is Merge duplicate records into 1 records with the same table and table fields
sql returning duplicates of 'name':
scope :tallied, lambda { group(:name, :amount).select("charges.name AS name,
SUM(charges.amount) AS amount,
COUNT(*) AS tally").order("name, amount desc") }
What I need is
scope :tallied, lambda { group(:name, :amount).select("DISTINCT ON(charges.name) charges.name AS name,
SUM(charges.amount) AS amount,
COUNT(*) AS tally").order("name, amount desc") }
except, rather than returning the first row of a given name, should return mash of all rows with a given name (amount added)
In mysql, appending .group(:name) (not needing the initial group) to the select would work as expected.
This seems like an everyday sort of task which should be easy. What would be a simple way of doing this? Please point me on the right path.
P.S. I'm trying to learn here (so are others), don't just throw sql in my face, please explain it.
I've no idea what RoR is doing in the background, but I'm guessing that group(:name, :amount) will run a query that groups by name, amount. The one you're looking for is group by name:
select name, sum(amount) as amount, count(*) as tally
from charges
group by name
If you append amount to the group by clause, the query will do just that -- i.e. count(*) would return the number of times each amount appears per name, and the sum() would return that number times that amount.