Google Workspace, BigQuery and Looker/Data Studio - sql

I wonder if someone is able to help me out here.
My company is a Google Workspace based company, we pipe our data through to BigQuery using the settings within the Admin Panel. That all works fantastically and there is no issue there.
Within BigQuery, one of the data points is time_usec - which is the Epoch time.
In one table (activity), this is done in Micros which is fine as I can transform the data in BigQuery using TIMESTAMP_MICROS(time_usec).
However, in Looker/Data Studio we are bringing the data in its raw format.
I have tried the calculated field TODATE(time_usec, "MICROS", "%x") which certainly transforms the data for display purposes but it doesn't appear to work with Date Ranges.
To explain this better:
Here is an example of the data in BigQuery. This is part of the Activity table with:
time_usec being epoch time (in microseconds)
email being the user account (every activity that is logged has a time_usec entry)
device_type being the type of device (everything from Windows machines, to iOS etc)
device_model being the model of the device
os_version being the version of the OS running on the device
To be fair, the data for the most part is irrelevant here (other than the time_usec) but I'm adding it to give a better idea of the type of data.
time_usec
email
device_type
device_model
os_version
1659952732837000
new.user.3#somemailservice.com
DESKTOP_CHROME
ChromeOs 14816.131.0
1659952299942000
some.email#somemailservice.com
WINDOWS
HP EliteBook 850 G5
Windows 10.0.19044
1659952366245000
new.user.3#somemailservice.com
DESKTOP_CHROME
1659952736142000
new.user.3#somemailservice.com
DESKTOP_CHROME
ChromeOs 14816.131.0
1659945047719000
some.email#somemailservice.com
WINDOWS
HP EliteBook 850 G5
Windows 10.0.19044
1659959338167000
another.email#somemailservice.com
DESKTOP_CHROME
HP Elite Dragonfly Chromebook
ChromeOs 14909.100.0
1659959340697000
another.email#somemailservice.com
DESKTOP_CHROME
HP Elite Dragonfly Chromebook
ChromeOs 14909.100.0
1659961092792000
some.email#somemailservice.com
WINDOWS
HP EliteBook 850 G5
Windows 10.0.19044
1659958186331000
another.email#somemailservice.com
WINDOWS
HP COMPAQ PRO 6305 SFF
Windows 10.0.19044
1659957469855000
some.email#somemailservice.com
WINDOWS
HP EliteBook 850 G5
Windows 10.0.19044
Here is how the connection is set up in Looker/Data Studio:
Here is how the time_usec field in configured in Looker/Data Studio:
Here is the page in Looker/Data Studio:
The column titled Date with the red underline is using TODATE(time_usec, "MICROS", "%x").
This is configured as a Date Range Dimension using the above method, then added as a normal dimension after. For the purpose of displaying a date, this works fine.
The yellow box gives us the ability to select a date range. In the image it is set to 'Auto date range' and it works fine. However, if I change this to, let's say the last 14 days it returns No data on the table.
As far as the Data itself is concerned, the time_usec field is set as a Number with no default aggregation. I have seen other questions about this answered in the past, with people saying that you can configure the field to be a Date rather than a number. However, attempting to do so produces the following message:
Looker Studio can't convert time_usec (Number) to a Date or Date &
Time because it doesn't recognize the date format.
Possible solutions:
Change your data to use a supported format
Create a calculated field to convert time_usec to a valid date. For example,
to convert "202011" to a date consisting of year and month: Example:
PARSE_DATE("%Y%m", time_usec)
I have also tried PARSE_DATETIME("%x", time_usec) and PARSE_DATE("%x", time_usec) as a calculated field. Again, for the purpose of displaying a date these seem to work fine. However, when then appying the date range it breaks with a message saying:
Am I doing something wrong here? I would rather not mess with the data at the BigQuery level. And I know I can use custom SQL to do the TIMESTAMP_MICROS(time_usec) then bring it in. But surely there is a better way to do this within Looker/Data Studio?
EDITED:
Added an example table with data and some config screenshot.

here's a formula that's working good on my end with date range control filter as well. please try it
DATE_FROM_UNIX_DATE(CAST(Time Only/86400000000 as INT64))

I searched for a long time for a solution to my similar problem.
I had the time in Unix microseconds. This is how I converted it to a date:
PARSE_DATETIME("%d-%m-%Y %H:%M:%S", CONCAT(DAY(time_usec, "MICROS"),"-",MONTH(time_usec, "MICROS"),"-",YEAR(time_usec, "MICROS")," ",HOUR(time_usec, "MICROS"),":",MINUTE(time_usec, "MICROS"),":",SECOND(time_usec, "MICROS")))

Related

How to change the Splunk Forwarder Formatting of my Logs?

I am using our Enterprise's Splunk forawarder which seems to be logging events in splunk like this which makes reading splunk logs a bit difficult.
{"log":"[https-jsse-nio-8443-exec-5] 19 Jan 2021 15:30:57,237+0000 UTC INFO rdt.damien.services.CaseServiceImpl CaseServiceImpl :: showCase :: Case Created \n","stream":"stdout","time":"2021-01-19T15:30:57.24005568Z"}
However, there are different Orgs in our Sibling Enterprise who log splunks thus which is far more readable. (No relation between us and them in tech so not able to leverage their tech support to triage this)
[http-nio-8443-exec-7] 15 Jan 2021 21:08:49,511+0000 INFO DaoOImpl [{applicationSystemCode=dao-app, userId=ANONYMOUS, webAnalyticsCorrelationId=|}]: This is a sample log
Please note the difference in logs (mine vs other):
{"log":"[https-jsse-nio-8443-exec-5]..
vs
[http-nio-8443-exec-7]...
Our Enterprise team is struggling to determine what causes this. I checked my app.log which looks ok (logged using Log4J) and doesn't have the aforementioned {"log" :...} entry.
[https-jsse-nio-8443-exec-5] 19 Jan 2021 15:30:57,237+0000 UTC INFO
rdt.damien.services.CaseServiceImpl CaseServiceImpl:: showCase :: Case
Created
Could someone guide me as to where could the problem/configuration lie that is causing the Splunk Forwarder to send the logs with the {"log":... format to splunk? I thought it was something to do with JSON type vs RAW which I too dont understand if its the cause and if it is - what configs are driving that?
Over the course of investigation - I found that is not SPLUNK thats doing this but rather the docker container. The docker container defaults to json-file that writes the outputs to the /var/lib/docker/containers folder with the **-json postfix which contains the logs in the `{"log" : <EVENT NAME} format.
I need to figure out how to fix the docker logging (aka the docker logging driver) to write in a non-json format.

Spark structured streaming groupBy not working in append mode (works in update)

I'm trying to get a streaming aggregation/groupBy working in append output mode, to be able to use the resulting stream in a stream-to-stream join. I'm working on (Py)Spark 2.3.2, and I'm consuming from Kafka topics.
My pseudo-code is something like below, running in a Zeppelin notebook
orderStream = spark.readStream().format("kafka").option("startingOffsets", "earliest").....
orderGroupDF = (orderStream
.withWatermark("LAST_MOD", "20 seconds")
.groupBy("ID", window("LAST_MOD", "10 seconds", "5 seconds"))
.agg(
collect_list(struct("attra", "attrb2",...)).alias("orders"),
count("ID").alias("number_of_orders"),
sum("PLACED").alias("number_of_placed_orders"),
min("LAST_MOD").alias("first_order_tsd")
)
)
debug = (orderGroupDF.writeStream
.outputMode("append")
.format("memory").queryName("debug").start()
)
After that, I would expected that data appears on the debug query and I can select from it (after the late arrival window of 20 seconds has expired. But no data every appears on the debug query (I waited several minutes)
When I changed output mode to update the query works immediately.
Any hint what I'm doing wrong?
EDIT: after some more experimentation, I can add the following (but I still don't understand it).
When starting the Spark application, there is quite a lot of old data (with event timestamps << current time) on the topic from which I consume. After starting, it seems to read all these messages (MicroBatchExecution in the log reports "numRowsTotal = 6224" for example), but nothing is produced on the output, and the eventTime watermark in the log from MicroBatchExecution stays at epoch (1970-01-01).
After producing a fresh message onto the input topic with eventTimestamp very close to current time, the query immediately outputs all the "queued" records at once, and bumps the eventTime watermark in the query.
What I can also see that there seems to be an issue with the timezone. My Spark programs runs in CET (UTC+2 currently). The timestamps in the incoming Kafka messages are in UTC, e.g "LAST__MOD": "2019-05-14 12:39:39.955595000". I have set spark_sess.conf.set("spark.sql.session.timeZone", "UTC"). Still, the microbatch report after that "new" message has been produced onto the input topic says
"eventTime" : {
"avg" : "2019-05-14T10:39:39.955Z",
"max" : "2019-05-14T10:39:39.955Z",
"min" : "2019-05-14T10:39:39.955Z",
"watermark" : "2019-05-14T10:35:25.255Z"
},
So the eventTime somehow links of with the time in the input message, but it is 2 hours off. The UTC difference has been subtraced twice. Additionally, I fail to see how the watermark calculation works. Given that I set it to 20 seconds, I would have expected it to be 20 seconds older than the max eventtime. But apparently it is 4 mins 14 secs older. I fail to see the logic behind this.
I'm very confused...
It seems that this was related to the Spark version 2.3.2 that I used, and maybe more concretely to SPARK-24156. I have upgraded to Spark 2.4.3 and here I get the results of the groupBy immediately (well, of course after the watermark lateThreshold has expired, but "in the expected timeframe".

LabVIEW to TRNSYS communication through type 62 error

I am trying to transfer data from TRNSYS (heating simulation program) to LabVIEW 2013 32 bit in real time through type 62 (in TRNSYS). Type 62 is an Excel file that transfers real time data from TRNYS to LabVIEW (and other way round). My LabVIEW program works on two different Windows 10 PCs and is not working on a Windows 7 although Excel versions are the same. It says there is an
Error 14012 occured at DDE Request
I have not attached the TRNSYS file but I can send it if you need to. 
Do you have any recommendation? 
Thank you enter image description here
enter image description here
Error 14012 is "Transaction Failed Error" Did you try troubleshooting your excel? also what's your excel version?
Here's a link: https://support.microsoft.com/en-gb/help/3001579/an-error-occurred-when-sending-commands-to-the-program-error-in-excel
I hope this helps...

Google Finance: How big is a normal delay for historical stock data or is something broken?

I tried to download historical data from Google with this code:
import pandas_datareader.data as wb
import datetime
web_df = wb.DataReader("ETR:DAI", 'google',
datetime.date(2017,9,1),
datetime.date(2017,9,7))
print(web_df)
and got this:
Open High Low Close Volume
Date
2017-09-01 61.38 62.16 61.22 61.80 3042884
2017-09-04 61.40 62.01 61.31 61.84 1802854
2017-09-05 62.01 62.92 61.77 62.42 3113816
My question: Is this a normal delay or is something broken?
Also I would want to know: Have you noticed that Google has removed the historical data pages at Google Finance? Is this a hint that the will remove or allready have removed the download option for historical stock data, too?
google finance using pandas has stopped working since last night, I am trying to figure out.I have also noticed that the links to the historical data on their website is removed.
It depends on which stocks and which market.
Example with Indonesia market, it still able to get latest data. Of course, it may be soon to follow the fate of others market that stop updating on 5 September 2017. A very sad things
web_df = wb.DataReader("IDX:AALI", 'google',
datetime.date(2017,9,1),
datetime.date(2017,9,7))
Open High Low Close Volume
Date
2017-09-04 14750.0 14975.0 14675.0 14700.0 475700
2017-09-05 14700.0 14900.0 14650.0 14850.0 307300
2017-09-06 14850.0 14850.0 14700.0 14725.0 219900
2017-09-07 14775.0 14825.0 14725.0 14725.0 153300

Performance Measurement - Get Average call time per function. Intel Vtune Amplifier

I'm simply trying to get the average time it takes each function to run.
That means I want the:
"Total time inside the function" / "Number of calls to the function"
I'm getting all sorts of information when I run an analysis from within VTune.
These are the settings I'm using:
And also:
But I can't find where average time is.
I can see the Total-Time per function but can't find the call count.
Using Visual Studio 2012, Vtune Amplifier XE 2013, Update 9.
Please help.
1) You have to run "Advanced Hotspots" analysis configured like shown in your second screen-shot. "Basic Hotspots" will NOT provide you with call count information.
2) Once you completed "Advanced Hotspots" - you can find statistical (approximate) call count in Bottom-up View as shown in screen-shot below:
Finally, make sure that you have "Loops and functions" mode selected at the bottom right side of GUI (it's true by default, but who knows what options did you play with).
3) In order to figure out total time and self-time don't forget to make sure you changed "viewpoint" to "Hotspots" (see area highlighed in green in my first screen-shot and also see next picture).
4) Starting from 2016 release Parallel Studio has
"precise loop call count and trip count"
"precise function call count"
measurement tool
(as well as total, self and even elapsed time and lots of SIMD-parallelism related analysis) available in "Intel (a ka "vectorization") Advisor", see more info here: