I have a table looking like this:
ReadingDate,=avg(Cost)
11/04/2011,£10.00
28/05/2011,£326.00
02/06/2011,£12.00
28/06/2011,£53.00
10/09/2011,£956.00
11/10/2011,£63.00
01/01/2012,£36.00
11/04/2012,£150.00
12/05/2012,£100.00
I know how to make an avg of a day or month, but how do I make limitations like 'between 01.05.2012 and 11.11.2013' and getting one average from it?
If you would like to do this in the load script, you can create a temporary table where you can perform the average over your desired range and then store this in a variable.
I used your source data for the below example:
SET DateFormat='DD/MM/YYYY';
MyData:
LOAD * INLINE [
ReadingDate, Cost
11/04/2011, 10.00
28/05/2011, 26.00
02/06/2011, 12.00
28/06/2011, 53.00
10/09/2011, 956.00
11/10/2011, 63.00
01/01/2012, 36.00
11/04/2012, 150.00
12/05/2012, 100.00
];
AverageData:
LOAD
avg(Cost) as AvgCost
RESIDENT MyData
WHERE (ReadingDate > '28/05/2011') AND (ReadingDate < '01/01/2012');
DROP TABLE AverageData;
LET AverageCost = peek('AvgCost',0,'AverageData');
Here, AverageCost is your variable and contains a single number (in this case 271). which you can then use later on in the script, for example:
MyData2:
NOCONCATENATE
LOAD
ReadingDate,
Cost
$(AverageCost)
RESIDENT MyData;
This then results in the following:
11/04/2011, 10.00, 271
28/05/2011, 26.00, 271
02/06/2011, 12.00, 271
28/06/2011, 53.00, 271
10/09/2011, 956.00, 271
11/10/2011, 63.00, 271
01/01/2012, 36.00, 271
11/04/2012, 150.00, 271
12/05/2012, 100.00, 271
Related
The MS Access SQL query shown below produces a result set. I want to take the completion_metric_query_result column from the result and add it to an actual table as a computed column. Is there any way I can do this?
Previously I attempted to use an update query but update queries in access do not work with aggregate function like AVG().
SELECT
PROMIS_LT_Long_ID.Short_ID,
FORMAT (100 * (AVG(IIF(PROMIS_LT_Long_ID.Status = 'Complete', 1.0, 0))), '#,##0.00') AS completion_metric_query_result
FROM
PROMIS_LT
INNER JOIN
PROMIS_LT_Long_ID ON PROMIS_LT.SubjectID = PROMIS_LT_Long_ID.Short_ID
GROUP BY
PROMIS_LT_Long_ID.Short_ID;
Result set:
SubjectID | completion_metric_query_result
----------+-------------------------------
02345800 | 12.00
13938432 | 12.50
13491349 | 0.00
12484028 | 15.00
12993248 | 75.00
I Created webi report result in crosstabe table,
when I try to get only the customer who made trans more than 50 in total like the customer 222 all the data less than 50 not show up,
example:
customer (222)
did two actions 1- 3/25/2018 with amount 209 gb,
2- 3/29/2018 with amount 14 gb,
the sum of both is (223),
I need to get all customer who made more than 50 actions in all days, when I try that by adding a filter the action in day 3/29/2018 is not shown & only the action in 3/25/2018 is what I get.
I think I am missing something basic here, can't seem to figure out what it is..
Querying BigQuery date partitioned table from Google cloud datalab. Most of the other queries fetches data as expected, not sure why in this particular table, select would not work, however count(1) query works.
%%sql
select * from Mydataset.sample_sales_yearly_part limit 10
I get below error:
KeyErrorTraceback (most recent call last) /usr/local/lib/python2.7/dist-packages/IPython/core/formatters.pyc in
__call__(self, obj)
305 pass
306 else:
--> 307 return printer(obj)
308 # Finally look for special method names
309 method = get_real_method(obj, self.print_method)
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/commands/_bigquery.pyc in _repr_html_query_results_table(results)
999 1000 def _repr_html_query_results_table(results):
-> 1001 return _table_viewer(results) 1002 1003
/usr/local/lib/python2.7/dist-packages/datalab/bigquery/commands/_bigquery.pyc in _table_viewer(table, rows_per_page, fields)
969 meta_time = ''
970
--> 971 data, total_count = datalab.utils.commands.get_data(table, fields, first_row=0, count=rows_per_page)
972
973 if total_count < 0:
/usr/local/lib/python2.7/dist-packages/datalab/utils/commands/_utils.pyc in get_data(source, fields, env, first_row, count, schema)
226 return _get_data_from_table(source.results(), fields, first_row, count, schema)
227 elif isinstance(source, datalab.bigquery.Table):
--> 228 return _get_data_from_table(source, fields, first_row, count, schema)
229 else:
230 raise Exception("Cannot chart %s; unsupported object type" % source)
/usr/local/lib/python2.7/dist-packages/datalab/utils/commands/_utils.pyc in _get_data_from_table(source, fields, first_row, count, schema)
174 gen = source.range(first_row, count) if count >= 0 else source
175 rows = [{'c': [{'v': row[c]} if c in row else {} for c in fields]} for row in gen]
--> 176 return {'cols': _get_cols(fields, schema), 'rows': rows}, source.length
177
178
/usr/local/lib/python2.7/dist-packages/datalab/utils/commands/_utils.pyc in _get_cols(fields, schema)
108 if schema:
109 f = schema[col]
--> 110 cols.append({'id': f.name, 'label': f.name, 'type': typemap[f.data_type]})
111 else:
112 # This will only happen if we had no rows to infer a schema from, so the type
KeyError: u'DATE'
QueryResultsTable job_Ckq91E5HuI8GAMPteXKeHYWMwMo
You may be hitting an issue that was just fixed in https://github.com/googledatalab/pydatalab/pull/68 (but not yet included in a Datalab release).
The background is that the new "Standard SQL" support in BigQuery added new datatypes that can show up in the results schema, and Datalab was not yet updated to handle those.
The next release of Datalab should fix this, but in the mean time you can work around it by wrapping your date fields in an explicit cast to TIMESTAMP as part of your query.
For example, if you see that error with the following code cell:
%%sql SELECT COUNT(*) as count, d FROM <mytable>
(where 'd' is a field of type 'DATE'), then you can work around the issue by casting that field to a TIMESTAMP like this:
%%sql SELECT COUNT(*) as count, TIMESTAMP(d) FROM <mytable>
For your particular query, you'll have to change '*' to the list of fields, so that you can cast the one with a date to a timestamp.
I have 2 rows like below:
941 78 252 3008 86412 1718502 257796 2223252 292221 45514 114894
980 78 258 3064 88318 1785623 269374 2322408 305467 46305 116970
I want to insert current time stamp while inserting each row.
finally in my hive table row should be like below:
941 78 252 3008 86412 1718502 257796 2223252 292221 45514 114894
2014-10-21
980 78 258 3064 88318 1785623 269374 2322408 305467 46305 116970
2014-10-22
Is there any way I can insert timestamp directly into hive without using pig script?
You can use from_unixtime(unix_timestamp()) while inserting.
For example, suppose you have following tables:
create table t1(c1 String);
create table t2(c1 String, c2 timestamp);
Now you can populate table t2 from t1 with current timestamp:
insert into table t2 select *, from_unixtime(unix_timestamp()) from t1;
i have this query and would like to indent the output and get the total from the last column.
Now it gives
person |year|dossiers
------------------------------------------------|----|--------
9210124 |1110| 166
9210124 |1111| 198
9210124 |1112| 162
9210161 |1110| 183
9210161 |1111| 210
9210161 |1112| 142
And i would like to have
person |year|dossiers
------------------------------------------------|----|--------
9210124 |1110| 166
|1111| 198
|1112| 162
9210161 |1110| 183
|1111| 210
|1112| 142
total 1061
Here the query
select
pers_nr "person",
to_char(import_dt,'YYMM') "year and month",
count(pers_nr) "dossiers"
from
rdms_3codon
where
trunc(import_dt) >= trunc(trunc(sysdate, 'Q') -1, 'Q')
and trunc(import_dt) < trunc(sysdate, 'Q')-1/(24*60*60)
group by
pers_nr,
to_char(import_dt,'YYMM')
order by
pers_nr
Could someone help me please ?
As noted in the comments, this is a client function, not a database one. For example, if you are using SQL*Plus, you can use:
break on person
break on report
compute sum label total of dossiers on report
The first line suppresses the duplicate person values; the second and third together generate the total at the bottom. SQL*Plus output formatting etc. is documented here.
Try this one. It will give you the totals at least but the rest
either can be replaced with NULLs also using RANK() for pers_id
or in the code of your application if any...
select
pers_nr "person",
to_char(import_dt,'YYMM') "year and month",
SUM(count(pers_nr)) OVER (ORDER BY year)
FROM ....
hope it helps abit