Month and Year of from data_adm column in bigquery - sql

I have a table with the column data_adm I need to create a new "month_and_year" in bigquery how to do it?
SELECT data_adm,
PARSE_DATE('%Y%m', 'data_adm') AS year_and_month
FROM "table"
data_adm
year_and_month
2022-12-05
2022-12

You may try and consider below approach:
with sample_data as (
select ('2022-12-05') as data_adm
)
select data_adm, FORMAT_DATE('%Y-%m', PARSE_DATE('%F', data_adm)) AS month_and_year from sample_data
Sample output:

Related

Insert data from table into a new one with condition

Okay, so this has been bugging me the whole day. I have two tables (e.g original_table and new_table). The new table is empty and I need to populate it with records from original_table given the following conditions:
Trip duration must be at least 30 seconds
Include only stations which have at least 100 trips starting there
Include only stations which have at least 100 trips ending there
The duration part is easy, but I find it hard to filter the other two conditions.
I tried to make two temporary tables like so:
CREATE TEMP TABLE start_stations AS(
SELECT ARRAY(SELECT DISTINCT start_station_id FROM `dataset.original_table`
WHERE duration_sec >= 30
GROUP BY start_station_id
HAVING COUNT(start_station_id)>=100
AND COUNT(end_station_id)>=100) as arr
);
CREATE TEMP TABLE end_stations AS(
SELECT ARRAY(SELECT DISTINCT end_station_id FROM `dataset.original_table`
WHERE duration_sec >= 30
GROUP BY end_station_id
HAVING COUNT(end_station_id)>=100
AND COUNT(start_station_id)>=100) as arr
);
And then try to insert in the new_table like this:
INSERT INTO `dataset.new_table`
SELECT a.* FROM `dataset.original_table` as a, start_stations as ss,
end_stations as es
WHERE a.start_station_id IN UNNEST(ss.arr)
AND a.end_station_id IN UNNEST(es.arr)
However, this does not provide me the right answer. I tried to make a temprary function to clean up the data, but I didnt go far. :(
Here's a sample of the table:
trip_id|duration_sec|start_date|start_station_id| end_date|end_station_id|
--------------------------------------------------------------------------|
afad333| 231|2017-12-20| 210|2017-12-20| 355|
sffde56| 35|2017-12-12| 355|2017-12-12| 210|
af33445| 333|2018-10-27| 650|2018-10-27| 650|
dd1238d| 456|2017-09-15| 123|2017-09-15| 210|
dsa2223| 500|2017-09-15| 210|2017-09-15| 123|
...
I will be very thankful If you can help me.
Thanks in advance!
Approach should be
with major_stations as(
select start_station_id station_id
from trips
group by start_station_id
having count(*) > 100
union
select end_station_id station_id
from trips
group by end_station_id
having count(*) > 100
)
select *
from trips
where start_station_id in (select station_id from major_stations)
and trip_duration > 30
There may be some easy way, but this is first approach I think of.
So I found what my problem was. Since I must filter out stations where 100 trips started AND ended, doing it the way I did before was wrong.
The current answer for me was this:
INSERT INTO dataset.new_table
WITH stations AS (
SELECT start_station_id, end_station_id FROM dataset.original_table
GROUP BY start_station_id, end_station_id
HAVING count(start_station_id)>=100
AND count(end_station_id)>=100
)
SELECT a.* FROM dataset.original_table AS a, stations as s
WHERE a.start_station_id = s.start_station_id
AND a.end_station_id = s.end_station_id
AND a.duration_sec >= 30
This way I am creating only one WITH clause which filters only start AND end stations, by the given criteria.
As easy as it looks, obviously my brain needs a rest sometimes and a start with a new perspective.

Using 1 row of data to create up to 3 rows of output SQL

I'm trying to create a query in which there can be up to 3 rows of output from 1 row of input:
Any one of the following is not null (e_date/c_date/o_date) -> Create one record in output
Any 2 of the following is not null (e_date/c_date/o_date) à create 2 records in output
All 3 following date fields are not null (e_date/c_date/o_date)à Create 3 records in output
I've attached a picture below of an example for it. If someone could please help me with the code logic on this it'd be greatly appreciated.
You can use UNION to do this. For example:
select cch_id, e_date as event_time, 'e_type' as event_type
from t where e_date is not null
union
select cch_id, c_date, 'c_type' from t where c_date is not null
union
select cch_id, o_date, 'o_type' from t where o_date is not null
order by cch_id, event_time
You can use apply:
select cch_id, v.event_time, v.event_type
from t cross apply
(values ('e_type', e_date),
('c_type', c_date),
('o_type', o_date)
) v(event_type, event_time)
where v.event_time is not null;
You can of course include other columns.
APPLY implements functionality known as a lateral join. Unpivoting like this is only one example of what lateral joins can do. This is a good introduction to this functionality.

Query to display specific employee names who are present only one month but not in other months

I need a sql query to display name of employees who are present "only in january" but not in feb,mar and april.
Here are the table details...
create table Employee(Id numeric(18,0),Name varchar(255),Date date);
insert into Employee values(101,'Shiva','2018/01/01'),
(102,'Vamsi','2018/01/01'),
(103,'Rajesh','2018/01/01'),
(104,'Krishna','2018/01/01'),
(105,'Tarun','2018/02/01'),
(101,'Shiva','2018/02/01'),
(103,'Rajesh','2018/02/01'),
(106,'Kaala','2018/03/02'),
(107,'Azeez','2018/03/02'),
(103,'Rajesh','2018/03/02'),
(108,'Eswar','2018/04/02'),
(109,'Dora','2018/04/02'),
(110,'Akash','2018/04/02'),
(103,'Rajesh','2018/04/02');
Expected result should be as following:
Vamsi
Krishna
Hope i'll get answer from you very soon
Thanks.
try this query
select Name from Employee where MONTH(date)=1
more details
In PostgreSQL you could try something like this. Here I select all employees that do not have a row where the month is not january.
SELECT Name
FROM Employee
WHERE Id NOT IN (SELECT Id FROM Employee WHERE date_part('month', Date) != 1);
A simple way is to use aggregation. If you only have data from one year:
select name
from Employee
group by name
having month(min(date)) = month(max(date)) and
month(min(date)) = 1;
If you want only January 2018:
select name
from Employee
group by name
having min(date) >= '2018-01-01' and
max(date) < '2018-02-01';
Try this
SELECT Id,Name,[Date]
FROM
(
SELECT *,
COUNT(Name)OVER(Partition by Name,DATEPART(YEAR,[date]) ORDER BY id)
AS EmpOnlyOnceInMonth
FROM EmployeeData
)dt
WHERE EmpOnlyOnceInMonth=1 AND DATEPART(M,dt.[date])=1
ORDER BY Id
Result
Id Name Date
----------------------
102 Vamsi 2018-01-01
104 Krishna 2018-01-01

SQL find min & max range within dataset

I have a table with the following columns:
contactId (int)
interval (int)
date (smalldate)
small sample data:
1,120,'12/02/2010'
1,121,'12/02/2010'
1,122,'12/02/2010'
1,123,'12/02/2010'
1,145,'12/02/2010'
1,146,'12/02/2010'
1,147,'12/02/2010'
2,122,'12/02/2010'
2,123,'12/02/2010'
2,124,'12/02/2010'
2,320,'12/02/2010'
2,321,'12/02/2010'
2,322,'12/02/2010'
2,450,'12/02/2010'
2,451,'12/02/2010'
how/is it possible - to get sql to return columns "contactId, minInterval, maxInterval, date", e.g
1,120,123,'12/02/2010'
1,145,147,'12/02/2010'
2,122,124,'12/02/2010'
2,320,322,'12/02/2010'
2,450,451,'12/02/2010'
hopefully this makes sense, basically i'm looking to figure out the min/max range of the intervals by provider & date for the range where they increment by one... once there is a break in the interval incrementer (e.g. more than one) then it would indicate a new min/max range...
any help is greatly appreciated :)
here is my exact SQL table setup:
create table availability
(
Id (int)
ProviderId (int)
IntervalId (int)
Date (date)
)
sample data
providerid,intervalid,date
1128,108,2010-12-27
1128,109,2010-12-27
1128,110,2010-12-27
1128,111,2010-12-27
1128,112,2010-12-27
1128,113,2010-12-27
1128,114,2010-12-27
1128,120,2010-12-27
1128,121,2010-12-27
1128,122,2010-12-27
1128,123,2010-12-27
1128,124,2010-12-27
1128,125,2010-12-27
1213,108,2010-12-27
1213,109,2010-12-27
1213,110,2010-12-27
1213,111,2010-12-27
1213,112,2010-12-27
1213,113,2010-12-27
1213,114,2010-12-27
1213,115,2010-12-27
1213,232,2010-12-27
1213,233,2010-12-27
1213,234,2010-12-27
3954,198,2010-12-27
3954,199,2010-12-27
3954,200,2010-12-27
3954,201,2010-12-27
3954,202,2010-12-27
3954,203,2010-12-27
3954,204,2010-12-27
3954,205,2010-12-27
3954,206,2010-12-27
3954,207,2010-12-27
3954,208,2010-12-27
3954,209,2010-12-27
3954,210,2010-12-27
3954,211,2010-12-27
3954,212,2010-12-27
3954,213,2010-12-27
3954,214,2010-12-27
3954,215,2010-12-27
3954,216,2010-12-27
3954,217,2010-12-27
3954,218,2010-12-27
3954,229,2010-12-27
3954,230,2010-12-27
3954,231,2010-12-27
3954,232,2010-12-27
3954,233,2010-12-27
3954,234,2010-12-27
1128,108,2010-12-28
1128,109,2010-12-28
1128,110,2010-12-28
1128,111,2010-12-28
1128,112,2010-12-28
1128,113,2010-12-28
1128,114,2010-12-28
1128,115,2010-12-28
1128,116,2010-12-28
3954,186,2010-12-28
3954,187,2010-12-28
3954,188,2010-12-28
3954,189,2010-12-28
3954,190,2010-12-28
3954,213,2010-12-28
3954,214,2010-12-28
3954,215,2010-12-28
3954,216,2010-12-28
3954,217,2010-12-28
3954,218,2010-12-28
3954,219,2010-12-28
3954,220,2010-12-28
3954,221,2010-12-28
3954,222,2010-12-28
sample result using current sql within answers:
1062,180,180,2010-12-20
1062,179,179,2010-12-20
1062,178,178,2010-12-20
1062,177,177,2010-12-20
1062,176,176,2010-12-20
1062,175,175,2010-12-20
1062,174,174,2010-12-20
1062,173,173,2010-12-20
1062,172,172,2010-12-20
1062,171,171,2010-12-20
1062,170,170,2010-12-20
1062,169,169,2010-12-20
1062,168,168,2010-12-20
1062,167,167,2010-12-20
1062,166,166,2010-12-20
1062,165,165,2010-12-20
1062,164,164,2010-12-20
1062,163,163,2010-12-20
1062,162,162,2010-12-20
1062,161,161,2010-12-20
1062,160,160,2010-12-20
1062,159,159,2010-12-20
1062,158,158,2010-12-20
1062,157,157,2010-12-20
1062,156,156,2010-12-20
1062,155,155,2010-12-20
1062,154,154,2010-12-20
1062,153,153,2010-12-20
1062,152,152,2010-12-20
1062,151,151,2010-12-20
1062,150,150,2010-12-20
1062,149,149,2010-12-20
1062,148,148,2010-12-20
1062,147,147,2010-12-20
1062,146,146,2010-12-20
1062,145,145,2010-12-20
1062,144,144,2010-12-20
1062,143,143,2010-12-20
1062,142,142,2010-12-20
1062,141,141,2010-12-20
1062,140,140,2010-12-20
1062,139,139,2010-12-20
1062,138,138,2010-12-20
1062,137,137,2010-12-20
1062,136,136,2010-12-20
1062,135,135,2010-12-20
1062,134,134,2010-12-20
1062,133,133,2010-12-20
1062,132,132,2010-12-20
1062,131,131,2010-12-20
1062,130,130,2010-12-20
1062,129,129,2010-12-20
1062,128,128,2010-12-20
1062,127,127,2010-12-20
1062,126,126,2010-12-20
1062,125,125,2010-12-20
1062,124,124,2010-12-20
1062,123,123,2010-12-20
1062,122,122,2010-12-20
1062,121,121,2010-12-20
1062,120,120,2010-12-20
1062,119,119,2010-12-20
1062,118,118,2010-12-20
1062,117,117,2010-12-20
1062,116,116,2010-12-20
1062,115,115,2010-12-20
1062,114,114,2010-12-20
1062,113,113,2010-12-20
1062,112,112,2010-12-20
In SQL Server, Oracle and PostgreSQL:
WITH q AS
(
SELECT t.*, interval - ROW_NUMBER() OVER (PARTITION BY contactID, date ORDER BY interval) AS sr
FROM mytable t
)
SELECT contactID, date, MIN(interval), MAX(interval)
FROM q
GROUP BY
date, contactID, sr
ORDER BY
date, contactID, sr
Update:
With your test data I get this output:
WITH mytable (providerId, intervalId, date) AS
(
SELECT 1128,108,'2010-12-27' UNION ALL
SELECT 1128,109,'2010-12-27' UNION ALL
SELECT 1128,110,'2010-12-27' UNION ALL
SELECT 1128,111,'2010-12-27' UNION ALL
SELECT 1128,112,'2010-12-27' UNION ALL
SELECT 1128,113,'2010-12-27' UNION ALL
SELECT 1128,114,'2010-12-27' UNION ALL
SELECT 1128,120,'2010-12-27' UNION ALL
SELECT 1128,121,'2010-12-27' UNION ALL
SELECT 1128,122,'2010-12-27' UNION ALL
SELECT 1128,123,'2010-12-27' UNION ALL
SELECT 1128,124,'2010-12-27' UNION ALL
SELECT 1128,125,'2010-12-27'
),
q AS
(
SELECT t.*, intervalId - ROW_NUMBER() OVER (PARTITION BY providerId, date ORDER BY intervalId) AS sr
FROM mytable t
)
SELECT providerId, date, MIN(intervalId), MAX(intervalId)
FROM q
GROUP BY
date, providerId, sr
ORDER BY
date, providerId, sr
1128 2010-12-27 108 114
1128 2010-12-27 120 125
, i. e. exactly what you were after.
Are you sure you are using the query correctly? Are you having duplicates on (providerId, intervalId, date)?
it's probably possible to do this with a SQL query alone, but it will probably be a bit mind-boggling. Basically a subquery to find places where it increments by one, joined to the original dataset, with tons of other logic in there. That's my impression at least.
If I were you,
If this is a one-time deal, don't care about performance and just iterate over it and do the calculation 'manually'.
If this is a production dataset and you need to do this operation on a frequent / automated / performance-intensive setting, then rearrange the original dataset to make this kind of query easier.
Hope one of those options is available to you.

SQL query ...multiple max value selection. Help needed

Business World 1256987 monthly 10 2009-10-28
Business World 1256987 monthly 10 2009-09-23
Business World 1256987 monthly 10 2009-08-18
Linux 4 U 456734 monthly 25 2009-12-24
Linux 4 U 456734 monthly 25 2009-11-11
Linux 4 U 456734 monthly 25 2009-10-28
I get this result with the query:
SELECT DISTINCT ljm.journelname,ljm. subscription_id,
ljm.frequency,ljm.publisher, ljm.price, ljd.receipt_date
FROM lib_journals_master ljm,
lib_subscriptionhistory
lsh,lib_journal_details ljd
WHERE ljd.journal_id=ljm.id
ORDER BY ljm.publisher
What I need is the latest date in each journal?
I tried this query:
SELECT DISTINCT ljm.journelname, ljm.subscription_id,
ljm.frequency, ljm.publisher, ljm.price,ljd.receipt_date
FROM lib_journals_master ljm,
lib_subscriptionhistory lsh,
lib_journal_details ljd
WHERE ljd.journal_id=ljm.id
AND ljd.receipt_date = (
SELECT max(ljd.receipt_date)
from lib_journal_details ljd)
But it gives me the maximum from the entire column. My needed result will have two dates (maximum of each magazine), but this query gives me only one?
You could change the WHERE statement to look up the last date for each journal:
AND ljd.receipt_date = (
SELECT max(subljd.receipt_date)
from lib_journal_details subljd
where subljd.journelname = ljd.journelname)
Make sure to give the table in the subquery a different alias from the table in the main query.
You should use Group By if you need the Max from date.
Should look something like this:
SELECT
ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price
, **MAX(ljd.receipt_date)**
FROM
lib_journals_master ljm
, lib_subscriptionhistory lsh
, lib_journal_details ljd
WHERE
ljd.journal_id=ljm.id
GROUP BY
ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price
Something like this should work for you.
SELECT ljm.journelname
, ljm.subscription_id
, ljm.frequency
, ljm.publisher
, ljm.price
,md.max_receipt_date
FROM lib_journals_master ljm
, ( SELECT journal_id
, max(receipt_date) as max_receipt_date
FROM lib_journal_details
GROUP BY journal_id) md
WHERE ljm.id = md.journal_id
/
Note that I have removed the tables from the FROM clause which don't contribute anything to the query. You may need to replace them if yopu simplified your scenario for our benefit.
Separate this into two queries one will get journal name and latest date
declare table #table (journalName as varchar,saleDate as datetime)
insert into #table
select journalName,max(saleDate) from JournalTable group by journalName
select all fields you need from your table and join #table with them. join on journalName.
Sounds like top of group. You can use a CTE in SQL Server:
;WITH journeldata AS
(
SELECT
ljm.journelname
,ljm.subscription_id
,ljm.frequency
,ljm.publisher
,ljm.price
,ljd.receipt_date
,ROW_NUMBER() OVER (PARTITION BY ljm.journelname ORDER BY ljd.receipt_date DESC) AS RowNumber
FROM
lib_journals_master ljm
,lib_subscriptionhistory lsh
,lib_journal_details ljd
WHERE
ljd.journal_id=ljm.id
AND ljm.subscription_id = ljm.subscription_id
)
SELECT
journelname
,subscription_id
,frequency
,publisher
,price
,receipt_date
FROM journeldata
WHERE RowNumber = 1