SQL - Parse a field and SUM numbers at regular delimiter intervals - sql

I request your help for an issue beyond my current skills...
I'm using Google Big Query to store analytics data about my website, and to calculate the revenue I have a quite difficult query to build.
We have the field %product% which is formatted as following :
;%productID%;%productQuantity%;%productRevenue%;;
If more than one product has been bought, the different products data will be delimited by ",", which can give this :
;12345678;1;49.99;;,;45678912;1;54.99;;
;45678912;2;59.98;;,;14521452;2;139.98;;,;12345678;2;19.98;;
;14521452;1;54.99;;
The only way to calculate the revenue is to sum all the different %productRevenue% from a line and store this into a column.
I have no idea how to do it just with a SQL query... Maybe with RegEx ? Any idea ?
I'd like to create a view with that info to easily pull the data into PowerBI then. But maybe I should process that with M directly in PBI ?
Thanks a lot,
Alex

Below is for BigQuery Standard SQL
#standardSQL
SELECT
SPLIT(i, ';')[OFFSET(1)] productID,
SUM(CAST(SPLIT(i, ';')[OFFSET(2)] AS INT64)) productQuantity,
SUM(CAST(SPLIT(i, ';')[OFFSET(3)] AS FLOAT64)) productRevenue
FROM `project.dataset.table`,
UNNEST(SPLIT(product)) i
GROUP BY productID
if to apply to sample data from your question - output is
Row productID productQuantity productRevenue
1 12345678 3 69.97
2 45678912 3 114.97
3 14521452 3 194.97

Related

Quick one on Big Query SQL-Ecommerce Data

I am trying to replicate the Google Analyitcs data in Big Query but couldnt do that.
Basically I am using Custom Dimension 40 (user subscription status)
but I am getting wrong numbers in BQ.
Can someone help me on this?
I am using this query but couldn't find it out the exact one.
SELECT
(SELECT value FROM hits.customDimensions where index=40) AS UserStatus,
COUNT(hits.transaction.transactionId) AS Unique_Purchases
FROM
`xxxxxxxxxxxxx.ga_sessions_2020*` AS GA, --new rollup
UNNEST(GA.hits) AS hits
WHERE
(SELECT value FROM hits.customDimensions where index=40) IN ("xx001","xxx002")
GROUP BY 1
I am getting this from big query which is wrong.
I have check out the dates also but dont know why its wrong.
Your question is rather unclear. But because you want something to be unique and numbers are mysteriously not what you want, I would suggest using COUNT(DISTINCT):
COUNT(DISTINCT hits.transaction.transactionId) AS Unique_Purchases
As far as I understand, you imported Google Analytics data into Bigquery and you are trying to group the custom dimension with index 40 and values ("xx001","xxx002") in order to know how many hit transactions were performed in function of these dimension values.
Replicating your scenario and trying to execute the query you posted, I got the following error.
However, I created a query that could help with your use-case. At first, it selects the transactionId and dimension values with the transactionId different from null and with index value equal to 40, then the grouping is done by the dimension value, filtered with values equals to "xx001"&"xxx002".
WITH tx AS (
SELECT
HIT.transaction.transactionId,
CD.value
FROM
`xxxxxxxxxxxxx.ga_sessions_2020*` AS GA,
UNNEST(GA.hits) AS HIT,
UNNEST(HIT.customDimensions) AS CD
WHERE
HIT.transaction.transactionId IS NOT NULL
AND
CD.index = 40
)
SELECT tx.value AS UserStatus, count(tx.transactionId) AS Unique_Purchases
FROM tx
WHERE tx.value IN ("xx001","xx002")
GROUP BY tx.value
For further details about the format and schema of the data that is imported into BigQuery, I found this document.

How can I reduce Google BigQuery costs?

I have been searching using Google BigQuery on the GDELT database of global news. I am repeating the same search 54 times, just changing the name of an African country.
Is it possible to include all 54 searches in the same query? As I understand the billing, the cost is based on the size of the database searched, not the number of query elements. Is that correct?
Here is an example of my queries for the country of Gabon, selecting themes appearing with ICT.
SELECT theme, COUNT(*) as count
FROM (
select UNIQUE(REGEXP_REPLACE(SPLIT(V2locations,';'), r',.*', '')) theme
from [gdelt-bq:gdeltv2.gkg]
where DATE>20150302000000 and DATE < 20200609000000 and V2locations like '%Gabon%'
AND V2themes like '%WB_133_INFORMATION_AND_COMMUNICATION_TECHNOLOGIES%'
)
group by theme
ORDER BY 2 DESC
LIMIT 300
The simplest way to do so without changing your query logic is to replace
V2locations like '%Gabon%'
with
REGEXP_MATCH(V2locations, r'Gabon|Angola|Zimbabwe')
Note: the query in question is in BigQuery LegacySQL - so obviously i would recommend migration to Standard SQL

Flatten nested data in Big Query to a single row

This is what the data looks like
This is what I am trying to achieve
I just need the flattened data to show destination 1 and destination 2 as well as duration 1 and duration 2.
I have used the unnest function in Big Query but it creates multiple rows. I am unable to use any aggregation to group the multiple rows as the data is non-numeric. Thank you for helping!
Below is for BigQuery Standard SQL
#standardSQL
SELECT EnquiryReference,
Destinations[OFFSET(0)].Name AS Destination1,
Destinations[SAFE_OFFSET(1)].Name AS Destination2,
Destinations[OFFSET(0)].Duration AS Duration1,
Destinations[SAFE_OFFSET(1)].Duration AS Duration2
FROM `project.dataset.table`
If to apply to sample data from your question
result will be

sql statement to calculate the average for a selected set of values

I am new to access and SQL statements. I have two tables, Site_ID and SE_WaterQuality_Data. For each site, several water quality parameters were collected over 5 weeks in summer and 5 weeks in winter. I want to be able to run a query that will return a table that shows the average of a particular parameter (eg Temp) grouped by the Site_ID and the sample period (eg summer 2013). I am close but my output table only shows the average value and not the site ID or sample period. The query also prompts the user to enter a particular Site_ID and I want it to run the query for all sites.
My SQL statement at the moment is
SELECT Avg(SE_WaterQuality_Data.[TEMP (C)]) AS [AvgOfTEMP (C)]
FROM SE_WaterQuality_Data
WHERE (((SE_WaterQuality_Data.EMS_ID)=[Site_ID].[EMS_ID]))
GROUP BY SE_WaterQuality_Data.EMS_ID, SE_WaterQuality_Data.SummaryPeriod;
And my output is
AveOFTEMP(C)
14.7
5.2
How can I change the SQL statement to 1) run the query for all sites and 2) return a table such as the one below:
Desired Output
Site_ID* SamplePeriod* AveTemp
1 Sum2013 14.2
1 Win2013 5.6
5 Sum2013 18.5
Help please......
If you want to run for all sites, take out the WHERE clause. And if you want to show other columns, include them in your SELECT clause.
SELECT [EMS_ID] AS [Site_ID],
[SummaryPeriod] AS [Sample_Period],
Avg(SE_WaterQuality_Data.[TEMP (C)]) AS [AvgOfTEMP (C)]
FROM SE_WaterQuality_Data
GROUP BY SE_WaterQuality_Data.EMS_ID, SE_WaterQuality_Data.SummaryPeriod;
I hope I got the syntax details right. I don't use SQL Server, I use MySQL. But the basic ideas are the same in all SQL dialects.
SELECT Site.Site_Id, WQ.SummaryPeriod, Avg(WQ.TEMP) AS AveTemp
FROM SE_WaterQuality_Data WQ, Site
WHERE WQ.EMS_ID = Site.EMS_ID
GROUP BY 1, 2
;

SQL query to produce a time x day grid from a list of timestamps?

Structures of my tables are as follow.
Table Name : timetable
timetable http://www.4shared.com/download/MYafV7-6ce/timetableTable.png
Table Name : slot_table
timetable http://www.4shared.com/download/9Lp_CBn2ba/slot_table.png
Table Name : instructor(this table is not required for this particular problem)
I want to show the resultant data in my android app in a timetable format somewhat like this:
random http://www.4shared.com/download/oAGiUXVAba/random.png
Question : What query i should write so that subjects of particular days with respective slots will be the result of the query?
1)The days should be in order like monday,tuesday,wednesday.
2)If monday has 2 subjects in 2 different slots then it should display like this :
Day 7:30-9:10AM 9:20-11:00AM
Monday Android Workshop Operating System
This is just a sample.
P.S:As timetable format is required,all the subjects with slot ids of all the days(monday to saturday) must be there in it.
Edit :
I tried this
select day,subject,slot from timetable,slot_table where timetable.slotid = slot_table.slotid
which gave a result :
a http://www.4shared.com/download/uMU7NA8Oce/random1.png
But i want it in a timetable format which i am not having an idea how to do that.
Edit :
Timetable sample format is something like this :
Edit :
I wrote a query
select timetable.day,count(slot_table.subject) as no_of_classes from timetable,slot_table where timetable.slotid = slot_table.slotid group by timetable.day
which resulted in
a http://www.4shared.com/download/rZW20_g8ce/random2.png
So now it shows monday has 2 classes in 2 slots,Tuesday has 1 class in 1 slot and so on.
Now any help on a query which can show the two slots(timings) on monday?
Solution :
select timetable.day,max(case when (slot='7:30-9:10AM') then slot_table.subject END) as "7:30-9:10AM",max(case when (slot='9:20-11:00AM') then slot_table.subject END) as "9:20-11:00AM",max(case when (slot='11:10-12:50PM') then slot_table.subject END) as "11:10-12:50PM",max(case when (slot='1:40-3:20PM') then slot_table.subject END) as "1:40-3:20PM", max(case when (slot='3:30-5:00PM') then slot_table.subject END) as "3:30-5:00PM" from timetable join slot_table on timetable.slotid = slot_table.slotid group by timetable.day
Result :
a http://www.4shared.com/download/1w7Tyicfce/random3.png
What you want is called a PIVOT query. In one of these, you have a select which gives the data in rows, like your result just under the EDIT (Day, subject, slot). Then you need to specify the values of the row you want to 'pivot' to become columns (slot in this example). Because a Pivot relies on the values of the column to be pivoted it can be difficult to write a general query, and the Postgres Wiki has an example using dymanic SQL and lots of code generating it at http://wiki.postgresql.org/wiki/Pivot_query
In your case, given that slots look like they're fixed and you might be able to hard-code them (that's a decision you'll have make yourself).
NB I am not a Postgres user, but it looks like it can do it (and I would have been very surprised if it couldn't).
This is a pivot or crosstab query. PostgreSQL has only limited support for these via the crosstab function in the tablefunc module.
It can sometimes be better to just deal with this in the application, accumulating the data into a table as you read each data point.