How to manipulate results from BigQuery

How to manipulate results from BigQuery - google-bigquery

I have the following query that I've been using, but everytime I have to rename a few things to be in order. Don't know how could I apply replace or regex_replace..
For example, with the query below, I get something like:
Row orderTotal partners data
1 100 partner_b 01/01/2021
And I end up needing to rename "partner_b" to Partner B
EXTRACT (date from CreationDateBR) as data,
FROM XXXXXXXXXXXXXXXXXXXXXX
WHERE CreationDateBR BETWEEN '2021-01-01' AND '2022-01-01'
AND loja IN ('Marketplace A')
AND parceiros NOT IN ('partner_a','partner_c')
GROUP BY partners,data
ORDER BY data,partners asc

Use INITCAP() function
for example
with data as (
select 'partner_b' partner union all
select 'partner_c' union all
select 'partner d'
)
select partner, initcap(partner)
from data
outputs

Related

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100

UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)

You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

Why do not I see a sample data chart when I pass a date parameter in pentaho cde?

I have created a cde in which I create two date parameter for star date and another for end date. I create the input date select, and in the sql I put the following:
select distinct
c.country,
sum(c.creditlimit) AS total_credit_limit
FROM
customers c,
orderfact o
WHERE
c.customernumber=o.customernumber
AND
(
to_char(ORDERDATE,'YYYY-MM-DD')>=${param_start_date}
AND
to_char(orderdate,'YYYY-MM-DD')<=${param_end_date}
)
GROUP BY
c.country
ORDER BY
total_credit_limit ASC

Unpivot date columns to a single column of a complex query in Oracle

Hi guys, I am stuck with a stubborn problem which I am unable to solve. Am trying to compile a report wherein all the dates coming from different tables would need to come into a single date field in the report. Ofcourse, the max or the most recent date from all these date columns needs to be added to the single date column for the report. I have multiple users of multiple branches/courses for whom the report would be generated.
There are multiple blogs and the latest date w.r.t to the blogtitle needs to be grouped, i.e. max(date_value) from the six date columns should give the greatest or latest date for that blogtitle.
Expected Result:
select u.batch_uid as ext_person_key, u.user_id, cm.batch_uid as ext_crs_key, cm.crs_id, ir.role_id as
insti_role, (CASE when b.JOURNAL_IND = 'N' then
'BLOG' else 'JOURNAL' end) as item_type, gm.title as item_name, gm.disp_title as ITEM_DISP_NAME, be.blog_pk1 as be_blogPk1, bc.blog_entry_pk1 as bc_blog_entry_pk1,bc.pk1,
b.ENTRY_mod_DATE as b_ENTRY_mod_DATE ,b.CMT_mod_DATE as BlogCmtModDate, be.CMT_mod_DATE as be_cmnt_mod_Date,
b.UPDATE_DATE as BlogUpDate, be.UPDATE_DATE as be_UPDATE_DATE,
bc.creation_date as bc_creation_date,
be.CREATOR_USER_ID as be_CREATOR_USER_ID , bc.creator_user_id as bc_creator_user_id,
b.TITLE as BlogTitle, be.TITLE as be_TITLE,
be.DESCRIPTION as be_DESCRIPTION, bc.DESCRIPTION as bc_DESCRIPTION
FROM users u
INNER JOIN insti_roles ir on u.insti_roles_pk1 = ir.pk1
INNER JOIN crs_users cu ON u.pk1 = cu.users_pk1
INNER JOIN crs_mast cm on cu.crsmast_pk1 = cm.pk1
INNER JOIN blogs b on b.crsmast_pk1 = cm.pk1
INNER JOIN blog_entry be on b.pk1=be.blog_pk1 AND be.creator_user_id = cu.pk1
LEFT JOIN blog_CMT bc on be.pk1=bc.blog_entry_pk1 and bc.CREATOR_USER_ID=cu.pk1
JOIN gradeledger_mast gm ON gm.crsmast_pk1 = cm.pk1 and b.grade_handler = gm.linkId
WHERE cu.ROLE='S' AND BE.STATUS='2' AND B.ALLOW_GRADING='Y' AND u.row_status='0'
AND u.available_ind ='Y' and cm.row_status='0' and and u.batch_uid='userA_157'
I am getting a resultset for the above query with multiple date columns which I want > > to input into a single columnn. The dates have to be the most recent, i.e. max of the dates in the date columns.
I have successfully done the Unpivot by using a view to store the above
resultset and put all the dates in one column. However, I do not
want to use a view or a table to store the resultset and then do
Unipivot simply because I cannot keep creating views for every user
one would query for.
The max(date_value) from the date columns need to be put in one single column. They are as follows:
* 1) b.entry_mod_date, 2) b.cmt_mod_date ,3) be.cmt_mod_date , 4) b.update_Date ,5) be.update_date, 6) bc.creation_date *
Apologies that I could not provide the desc of all the tables and the
fields being used.
Any help to get the above mentioned max of the dates from these
multiple date columns into a single column without using a view or a
table would be greatly appreciated.*

It is not clear what results you want, but the easiest solution is to use greatest().
with t as (
YOURQUERYHERE
)
select t.*,
greatest(entry_mod_date, cmt_mod_date, cmt_mod_date, update_Date,
update_date, bc.creation_date
) as greatestdate
from t;

select <columns>,
case
when greatest (b_ENTRY_mod_DATE) >= greatest (BlogCmtModDate) and greatest(b_ENTRY_mod_DATE) >= greatest(BlogUpDate)
then greatest( b_ENTRY_mod_DATE )
--<same implementation to compare each time BlogCmtModDate and BlogUpDate separately to get the greatest then 'date'>
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
case
when greatest (be_cmnt_mod_Date) >= greatest (be_UPDATE_DATE)
then greatest( be_cmnt_mod_Date )
when greatest (be_UPDATE_DATE) >= greatest (be_cmnt_mod_Date)
then greatest( be_UPDATE_DATE )
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
GREATEST(bc_creation_date)
,<columns>
FROM table
<rest of the query>

TSQL select clause includes more data than needed

I have an query that is used to pull some data but when I join another table, its duplicating my results for every record it has joined on in the other table.
I'm sure this is a simple issue I am overlooking but cant seem to get it.
My query is here:
SELECT A.[id],
A.[subject],
A.[description],
CONVERT(VARCHAR(17), A.[startTime], 100) as startTime,
CONVERT(VARCHAR(17), A.[endTime], 100) as endTime,
A.[whoCreated],
A.[center],
B.[FirstName],
B.[LastName],
B.[ntid] as empNTID,
C.[centerName],
D.[employee],
E.[segmentID]
FROM Focus_Meetings AS A
JOIN empTable as B
ON A.[whoCreated] = B.[empID]
JOIN Focus_Centers as C
ON A.[center] = C.[id]
JOIN Focus_Attendees as D
ON D.[meetingID] = A.[id]
JOIN Focus_Meetings_Segments as E
ON E.[meetingID] = A.[id]
WHERE
(CAST(A.startTime AS DATE) = CAST(COALESCE(#meetingDate, A.startTime) AS DATE) OR
CAST(A.endTime AS DATE) = CAST(COALESCE(#meetingDate, A.endTime) AS DATE) OR
(E.[segmentID] IN( SELECT ParamValues.x2.value('segment[1]', 'INT')
FROM #meetingSegment.nodes('/segments/theSegment') AS ParamValues(x2))
)
)
FOR XML PATH ('details'), TYPE, ELEMENTS, ROOT ('root');
There is 1 record in the Focus_Meetings table and 5 records in the Focus_Meetings_Segments.
My result should only be the one meeting but its giving a record for every D.[employee] and E.[segmentID].
I assume that's how its supposed to work with my query but that's not my intent.
There are 5 segments attached to the meeting in the Focus_Meetings_Segments and when I search one of them, it should only by showing me the meeting 1 time, not once for each segment.

You are correct that this is how your query is supposed to work. This is a common problem that many people new to JOINS run into.
Essentially, you are currently asking SQL Server to return every set of data based on your JOINS and that is what it is doing. It sounds like what you want is for it to arbitrarily drop records from the result set.
Consider the following simplified version of your result set:
Subject | Description | SegmentId
-----------------------------------------
Whatever | Some desc... | 1
Whatever | Some desc... | 2
Based on your description, you only want the Whatever | Some desc... portion of the results to display one time.
If that is what you want to do, you have a couple of options.
Stop selecting the data (SegmentId) that is causing the records
to show twice and only select distinct records.
SELECT DISTINCT Subject, Description...
Specify an aggregate function on the data that is causing records to show twice and group by the rest.
SELECT Subject, Description, MAX(SegmentId)... GROUP BY Subject, Description
You should also evaluate exactly what you need to select vs. what you are selecting. If you are arbitrarily selecting the SegmentId then you probably don't need it in the first place.

when I search one of them it should only by showing me the meeting 1 time, not once for each segment
Then take Segment and Employee out of th emain query and do a subquery:
SELECT A.[id],
A.[subject],
A.[description],
CONVERT(VARCHAR(17), A.[startTime], 100) as startTime,
CONVERT(VARCHAR(17), A.[endTime], 100) as endTime,
A.[whoCreated],
A.[center],
B.[FirstName],
B.[LastName],
B.[ntid] as empNTID,
C.[centerName]
FROM Focus_Meetings AS A
JOIN empTable as B
ON A.[whoCreated] = B.[empID]
JOIN Focus_Centers as C
ON A.[center] = C.[id]
WHERE A.[id] IN
(
SELECT E.[meetingID]
FROM Focus_Meetings_Segments as E
WHERE
(CAST(A.startTime AS DATE) = CAST(COALESCE(#meetingDate, A.startTime) AS DATE) OR
CAST(A.endTime AS DATE) = CAST(COALESCE(#meetingDate, A.endTime) AS DATE) OR
(E.[segmentID] IN( SELECT ParamValues.x2.value('segment[1]', 'INT')
FROM #meetingSegment.nodes('/segments/theSegment') AS ParamValues(x2))
)
)
)
FOR XML PATH ('details'), TYPE, ELEMENTS, ROOT ('root');

SQL: Using UNION

Here is the question and database info.
Use the UNION command to prepare a full statement for customer 'C001' - it should be laid out as follows. (Note that the values shown below are not correct.) You may be able to use '' or NULL for blank values - if necessary use 0.
Here is a link to the webpage with the database info. http://sqlzoo.net/5_0.htm or see the image below.
Here is what I have tried:
SELECT sdate AS LineDate, "delivery" AS LEGEND, price*quantity AS Total,"" AS Amount
FROM shipped
JOIN product ON (shipped.product=product.id)
WHERE badguy='C001'
UNION
SELECT rdate,notes, "",receipt.amount
FROM receipt
WHERE badguy='C001'
Here is what I get back:
Wrong Answer. The correct answer has 5 row(s).
The amounts don't seem right in the amount column and I can't figure out how to order the data by the date since it is using two different date columns (sdate and rdate which are UNIONED).

Looks like the data in the example is being aggregated by date and charge type using group by, that's why you are getting too many rows.
Also, you can sort by the alias of the column (LineDate) and the order by clause will apply to all the rows in the union.
SELECT sdate AS LineDate, "delivery" AS LEGEND, SUM(price*quantity) AS Total,"" AS Amount
FROM shipped
JOIN product ON (shipped.product=product.id)
WHERE badguy='C001'
GROUP BY sdate
UNION
SELECT rdate, notes, "",receipt.amount
FROM receipt
WHERE badguy='C001'
ORDER BY LineDate

It's usually easiest to develop each part of the union separately. Pay attention to the use of "null" to separate the monetary columns. The first select gets to name the columns.
select s.sdate as tr_date, 'Delivery' as type, sum((s.quantity * p.price)) as extended_price, null as amount
from shipped s
inner join product p on p.id = s.product
where badguy = 'C001'
group by s.sdate
union all
select rdate, notes, null, sum(amount)
from receipt
where badguy = 'C001'
group by rdate, notes
order by tr_date

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to manipulate results from BigQuery - google-bigquery

Use INITCAP() function for example with data as ( select 'partner_b' partner union all select 'partner_c' union all select 'partner d' ) select partner, initcap(partner) from data outputs

Related

Recursive subtraction from two separate tables to fill in historical data

Why do not I see a sample data chart when I pass a date parameter in pentaho cde?

Unpivot date columns to a single column of a complex query in Oracle

TSQL select clause includes more data than needed

SQL: Using UNION

Categories

Resources