big Query adding a month to a search - google-bigquery

looking to add a month to a search in gdelt.
If I use only the year the code work great;
but if I looking for a month (like 2019-11) the result is empty.
tnx for your help!
SELECT
person,
SUM(count_of_mentions) AS all_mentions
FROM
`static-beach-312816.56146_2021_class_materials..israel_media_person_sum`
WHERE
domain = 'walla.co.il'
AND SUBSTR(CAST(DATE AS STRING), 1,4) = '2019'
GROUP BY
person
ORDER BY
all_mentions DESC
LIMIT
50;

You can try
EXTRACT(month FROM DATE) = 11 AND EXTRACT(year FROM DATE) = 2019
Check this
https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions
You may also try the one below That you were trying to but I will go with option above.
CONCAT(CAST(EXTRACT(YEAR FROM DATE (Date)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (Date)) AS STRING))=‘2019_11’

-- Do not forget to change Project.Dataset.Tale below to yours!
-- This is for January 2019
SELECT
person,
SUM(count_of_mentions) AS all_mentions
FROM
`gdeltgkgtry.56146_2021_class_materials.israel_media_person_sum`
WHERE
domain = 'walla.co.il'
AND SUBSTR(CAST(DATE AS STRING), 1,6) = '201901'
GROUP BY
person
ORDER BY
all_mentions DESC
LIMIT
50;
-- This is for all months in 2019
SELECT
person,
SUBSTR(CAST(DATE AS STRING), 1,6) AS year_month,
SUM(count_of_mentions) AS all_mentions
FROM
`gdeltgkgtry.56146_2021_class_materials.israel_media_person_sum`
WHERE
domain = 'walla.co.il'
AND SUBSTR(CAST(DATE AS STRING), 1,4) = '2019'
GROUP BY
person,
year_month
ORDER BY
all_mentions DESC
LIMIT
50;
-- This is for all months in 2019 for Benjamin Netanyahu (note that here we capture ALL types of Netanyahu mentions in all urls)
WITH
person_all AS (
SELECT
person,
SUBSTR(CAST(DATE AS STRING), 1,6) AS year_month,
SUM(count_of_mentions) AS all_mentions
FROM
`gdeltgkgtry.56146_2021_class_materials.israel_media_person_sum`
WHERE
domain = 'walla.co.il'
AND SUBSTR(CAST(DATE AS STRING), 1,4) = '2019'
AND LOWER(person) LIKE '%in netanyah%'
GROUP BY
person,
year_month )
SELECT
'Benjamin Netanyahu',
year_month,
SUM(all_mentions) AS all_mentions
FROM
person_all
GROUP BY
1,
2
ORDER BY
2 ASC;

Related

Count records for first day of every month in a year

I have a table with 4 columns huge number of records. It has the following structure:
DATE_ENTERED EMP_NAME DATA ORIGINATED
01-JAN-20 A 545454 APPLE
I want to calculate no of records for every first day of every month in a year
is there any way can we fetch the data for every first day of month.
In oracle you can use TRUNC function on the date as follows:
SELECT TRUNC(DATE_ENTERED), COUNT(1) AS CNT
FROM YOUR_TABLE
WHERE TRUNC(DATE_ENTERED) = TRUNC(DATE_ENTERED, 'MON')
GROUP BY TRUNC(DATE_ENTERED, 'MON')
Please note that the TRUNC(DATE_ENTERED, 'MON') returns the first day of the month for DATE_ENTERED.
Cheers!!
SELECT Year, Month, COUNT(*)
FROM
(
SELECT
YEAR(DATE_ENTERED) Year
MONTH(DATE_ENTERED) Month
DAY(DATE_ENTERED) Day
FROM your_table
WHERE DAY(DATE_ENTERED) = 1
) A
GROUP BY Year, Month
Generally WHERE DAY(DATE_ENTERED) = 1 will get you the records only for dates at the start of each month. Thus using Year and Month function you can group them by in order to get a count for each year and each month
You mean something like
SELECT COUNT(*)
FROM Table
WHERE DAY(DATE_ENTERED) = 1 AND
YEAR(DATE_ENTERED) = Some_Year
GROUP BY DATE_ENTERED
You can also use DATE_ENTERED BETWEEN 'YYYY0101' and 'YYYY1231' (replace the YYYY with the year you want to retrieve data for) instead of YEAR(DATE_ENTERED) = Some_Year, if performance is an issue.
You can use something like this:
select * from your_table
where DAY(DATE_ENTERED) = 1
and DATE_ENTERED between '2020-01-01' and '2020-12-31'
for number of count use this:
select count(*) from your_table
where DAY(DATE_ENTERED)= 1
and DATE_ENTERED between '2020-01-01' and '2020-12-31'
UPDATE
select * from your_table where Extract(day FROM DATE_ENTERED) = 1 and DATE_ENTERED between '01-JAN-20 ' and '01-DEC-20 ';
this is how the data looks like:
For the list of records
select count(*) from your_table where Extract(day FROM DATE_ENTERED) = 1 and DATE_ENTERED between '01-JAN-20 ' and '01-DEC-20 ';
UPDATE-2
select EXTRACT(month from DATE_ENTERED) as Count,
to_char(to_date(DATE_ENTERED, 'DD-MM-YYYY'), 'Month') from your_table
where Extract(day FROM DATE_ENTERED) = 1 and DATE_ENTERED between '01-JAN-20
'and '01-DEC-20 ' group by EXTRACT(month from DATE_ENTERED),
to_char(to_date(DATE_ENTERED, 'DD-MM-YYYY'), 'Month');
Here is the output:

PostgreSQL - How to use window function to pull the max value from a column

I am a beginner, so apologies in advance for simple/non-technical terminology.
I have a table where each row shows the company name, day/month/year, and how many visits they received in that day. My goal is to show which company had the highest number of visits for January 2018.
I was able to find how many visits each company received in January 2018 using this query:
select to_char(datecolumn,'Mon') as monthkey, extract(year from datecolumn) as yearkey, companyname, sum(visits) as sumvisits
from t1
where monthkey = 'Jan' and yearkey = '2018'
group by monthkey, yearkey, companyname
order by companyname
Now I need to use a window function to find the max value of sumvisits of January along with the corresponding company, but I'm stuck.
I've tried partitioning by month:
select companyname, monthkey, max(sumvisits) over (partition by monthkey) as maxvisits
from (select to_char(f_date,'Mon') as monthkey, extract(year from f_date) as yearkey, companyname, sum(visits) as sumvisits
from t1
where monthkey = 'Jan' and yearkey = '2018'
group by monthkey, yearkey, dealername
order by companyname)
But this query just gives me the max visits of one company and lists it for every company.
I don't think I should use the limit function or anything like that because the query needs to be applicable to multiple months.
I want to to see:
monthkey yearkey companyname sumvisits
Jan 2018 ABCInc 5000
Can someone please help advise what I'm doing wrong/point me in the right direction?
Just use order by and limit:
select to_char(datecolumn,'Mon') as monthkey, extract(year from datecolumn) as yearkey, companyname,
sum(visits) as sumvisits
from t1
where monthkey = 'Jan' and yearkey = '2018'
group by monthkey, yearkey, companyname
order by sumvisits desc
limit 1;

Get last data recorded of the date and group it by month

tbl_totalMonth has id,time, date and kwh column.
I want to get the last recorded data of the months and group it per month so the result would be the name of the month and kwh.
the result should be something like this:
month | kwh
------------
January | 150
February | 400
the query I tried: (but it returns the max kwh not the last kwh recorded)
SELECT DATENAME(MONTH, a.date) as monthly, max(a.kwh) as kwh
from tbl_totalMonth a
WHERE date > = DATEADD(yy,DATEDIFF(yy,0, GETDATE() -1 ),0)
group by DATENAME(MONTH, a.date)
I suspect you need something quite different:
select *
from (
select *
, row_number() over(partition by month(a.date), year(a.date) order by a.date DESC) as rn
from tbl_totalMonth a
WHERE date > = DATEADD(yy,DATEDIFF(yy,0, GETDATE() -1 ),0)
) d
where rn = 1
To get "the last kwh recorded (per month)" you need to use row_number() which - per month - will order the rows (descending) and give each one a row number. When that number is 1 you have "the most recent" row for that month, and you won't need group by at all.
You could use group by and month
select datename(month, date), sum(kwh)
from tbl_totalMonth
where date = (select max(date) from tbl_totalMonth )
group by datename(month, date)
if you need only the last row for each month then youn should use
select datename(month, date), khw
from tbl_totalMonth a
inner join (
select max(date) as max_date
from tbl_totalMonth
group by month(date)) t on t.max_date = a.date

Filter rows in PostgreSQL based on values of consecutive rows in one column

So I'm working with the following postgresql table:
10 rows from PostGreSQL table
For each business_id, I want to filter out those businesses where the review_count isn't above a specific review_count threshold for 2 consecutive months (or rows). Depending on the city the business_id is in, the threshold will be different (so for example, in the screenshot above, we can assume rows with city = Charlotte has a review_count threshold of >= 2, and those with city = Las Vegas has a review_count threshold of >= 3. If a business_id does not have at least one instance of consecutive months with review_counts above the specified threshold, I want to filter it out.
I want this query to return only the business_ids that meet this condition (as well as all the other columns in the table that go along with that business_id). The composite primary key on this table is (business_id, year, month).
Some months, as you may notice, are missing from the data (month 9 of the second business_id). If that is the case, I do NOT want to count 2 rows as 'consecutive months'. For example, for the business in Las Vegas, I do NOT want to consider month 8 to 10 as 'consecutive months', even though they appear in consecutive rows.
I've tried something like this, but have kind of run into a wall and don't think its getting me far:
SELECT *
FROM us_business_monthly_review_growth
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM us_business_monthly_review_growth
GROUP BY business_id, year, month
HAVING (city = 'Las Vegas'
AND (CASE WHEN COUNT(review_count >= 2 * 2.21) >= 2))
OR (city = 'Charlotte' AND (CASE WHEN COUNT(review_count >= 2 * 1.95) >= 2))
I'm new to Postgre and StackOverflow, so if you have any feedback on the way I asked this question please don't hesitate to let me know! =)
UPDATE:
Thanks to some help from #Gordon Linoff, I found the following solution:
SELECT *
FROM us_businesses_monthly_growth_and_avg
WHERE business_id IN (SELECT distinct(business_id)
FROM (SELECT *,
lag(year) OVER (PARTITION BY business_id ORDER BY year, month) AS prev_year,
lag(month) OVER (PARTITION BY business_id ORDER BY year, month) AS prev_month,
lag(review_count) OVER (PARTITION BY business_id ORDER BY year, month) AS prev_review_count
FROM us_businesses_monthly_growth_and_avg
) AS usga
WHERE (city = 'Charlotte' AND review_count >= 4 * 1.95 AND prev_review_count >= 4 * 1.95 AND (YEAR * 12 + month) = (prev_year * 12 + prev_month) + 1)
OR (city = 'Las Vegas' AND review_count >= 4 * 3.31 AND prev_review_count >= 4 * 3.31 AND (YEAR * 12 + month) = (prev_year * 12 + prev_month) + 1);
You can do this with lag():
select distinct business_id
from (select t.*,
lag(year) over (partition by business_id order by year, month) as prev_year,
lag(month) over (partition by business_id order by year, month) as prev_month,
lag(rating) over (partition by business_id order by year, month) as prev_rating
from us_business_monthly_review_growth t
) t
where rating >= $threshhold and prev_rating >= $threshhold and
(year * 12 + month) = (prev_year * 12 + prev_month) + 1;
The only trick is setting the threshold value. I have no idea how you plan on doing that.
Please try...
SELECT business_id
FROM
(
SELECT business_id AS business_id,
LAG( business_id, -1 ) OVER ( ORDER BY business_id, year, month ) AS lag_in_business_id,
city,
LAG( year, -1 ) OVER ( ORDER BY business_id, year, month ) * 12 + LAG( month, -1 ) OVER ( ORDER BY business_id, year, month ) AS diffInDates,
review_count AS review_count
FROM us_business_monthly_review_growth
order BY business_id,
year,
month
) tempTable
JOIN tblCityThresholds ON tblCityThresholds.city = tempTable.city
WHERE business_id = lag_in_business_id
AND diffInDates = 1
AND tblCityThresholds.threshold <= review_count
GROUP BY business_id;
In formulating this answer I first used the following code to test that LAG() performed as hoped...
SELECT business_id,
LAG( business_id, 1 ) OVER ( ORDER BY business_id, year, month ) AS lag_in_business_id,
year,
month,
LAG( year, 1 ) OVER ( ORDER BY business_id, year, month ) * 12 + LAG( month, 1 ) OVER ( ORDER BY business_id, year, month ) AS diffInDates
FROM mytable
ORDER BY business_id,
year,
month;
Here I was trying to get LAG() to refer to values on the next row, but the output showed that it was referring to the previous row in that comparison. Unfortunately I wanted to compare current values with the next one to see if the next record had the same business_id, etc. So I changed the 1 in LAG() to `-1', giving me...
SELECT business_id,
LAG( business_id, -1 ) OVER ( ORDER BY business_id, year, month ) AS lag_in_business_id,
year,
month,
LAG( year, -1 ) OVER ( ORDER BY business_id, year, month ) * 12 + LAG( month, -1 ) OVER ( ORDER BY business_id, year, month ) AS diffInDates
FROM mytable
ORDER BY business_id,
year,
month;
As this gave me the desired results I added city, to allow a JOIN between the results and an assumed table holding the details of each city and its corresponding threshold. I chose the name tblCityThresholds as a suggestion since I am not sure what you have / would call it. This completed the inner SELECT statement.
I then joined the results of the inner SELECT statement to tblCityThresholds and refined the output as per your criteria. Note : It is assumed that the city field will always have a corresponding entry in tblCityThresholds;
I then used GROUP BY to ensure no repetition of a business_id.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://www.postgresql.org/docs/8.4/static/functions-window.html (in regards LAG())

SQL Server : getting year and month

I have a SQL Server table like this:
http://sqlfiddle.com/#!2/a15dd/1
What I want to do is display the latest year and month where trades were made.
In this case, i want to display
ID: 1
Year: 2013
Month: 11
Trades: 2
I've tried to use:
select
id, MAX(year), MAX(month)
from
ExampleTable
where
trades > 0
group by
id
Do I have to concatenate the columns?
You can use ROW_NUMBER to assign each row a number based on it's relative position (as defined by your order by):
SELECT ID,
Year,
Month,
Trades,
RowNum = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Year DESC, Month DESC)
FROM ExampleTable
WHERE Trades > 0;
With your example data this gives:
ID YEAR MONTH TRADES RowNum
1 2013 11 2 1
1 2013 4 42 2
Then you can just limit this to where RowNum is 1:
SELECT ID, Year, Month, Trades
FROM ( SELECT ID,
Year,
Month,
Trades,
RowNum = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Year DESC, Month DESC)
FROM ExampleTable
WHERE Trades > 0
) AS t
WHERE t.RowNum = 1;
If, as in your example, Year and Month are stored as VARCHAR you will need to convert to an INT before ordering:
RowNum = ROW_NUMBER() OVER(PARTITION BY ID
ORDER BY
CAST(Year AS INT) DESC,
CAST(Month AS INT) DESC)
Example on SQL Fiddle
If you are only bothered about records where ID is 1, you can do it simply using TOP:
SELECT TOP 1 ID, Year, Month, Trades
FROM ExampleTable
WHERE ID = 1
AND Trades > 0
ORDER BY CAST(Year AS INT) DESC, CAST(MONTH AS INT) DESC;
Why store "year" and "month" as separate columns? In any case, the basic logic is to combine the two values to get the latest one. This is awkward because you are storing numbers as strings and the months are not zero-padded. But it is not so hard:
select id,
max(year + right('00' + month, 2))
from ExampleTable
group by id;
To separate them out:
select id,
left(max(year + right('00' + month, 2)), 4) as year,
right(max(year + right('00' + month, 2)), 2) as month
from ExampleTable
group by id;
Here is a SQL Fiddle. Note when you use SQL Fiddle that you should set the database to the correct database.
I'm not sure whether I get your question right, but shouldn't the following work?
SELECT TOP 1 year + '-' + month AS Last, trades
FROM ExampleTable
WHERE CAST(trades AS INTEGER) > 0
ORDER BY CAST(year AS integer) DESC, CAST(month AS integer) DESC
SQLFiddle
Try this
SELECT TOP 1 ID, [year], trades,
MAX(Convert(INT,[month])) OVER(PARTITION BY [year]) AS [Month]
FROM ExampleTable
WHERE trades > 0