Filter for records between two dates - apache-pig

I don't seem to see any keyword equivalent for MySQL 'BETWEEN' in Pig Latin.
What I want to do is filter for records between two specific dates. Any tips on how to do so in Pig Latin?

To do so, using FILTER.
variable = FILTER variable_holding_table_data BY (GetYear(date_column)==year) AND (GetMonth(date_column)==month) AND (GetDay(date_column)=>date_day_start AND GetDay(date_column)<=date_day_end);
e.g. query = FILTER orders BY (GetYear(date)==2013) AND (GetMonth(date)==05) AND (GetDay(date)>=01 AND GetDay(date)<=31);
Resources:
Pig Date Functions

Related

Access SQL GROUP BY problem (eg. tbl_Produktion.ID not part of the aggregation-function)

I want to group by two columns, however MS Access won't let me do it.
Here is the code I wrote:
SELECT
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter,
tbl_Produktion.ProduktionsID, tbl_Produktion.Linie,
tbl_Produktion.Schicht, tbl_Produktion.Anzahl_Schichten_P,
tbl_Produktion.Schichtteam, tbl_Produktion.Von, tbl_Produktion.Bis,
tbl_Produktion.Pause, tbl_Produktion.Kunde, tbl_Produktion.TeileNr,
tbl_Produktion.FormNr, tbl_Produktion.LabyNr,
SUM(tbl_Produktion.Stueckzahl_Prod),
tbl_Produktion.Stueckzahl_Ausschuss, tbl_Produktion.Ausschussgrund,
tbl_Produktion.Kommentar, tbl_Produktion.StvSchichtleiter,
tbl_Produktion.Von2, tbl_Produktion.Bis2, tbl_Produktion.Pause2,
tbl_Produktion.Arbeiter3, tbl_Produktion.Von3, tbl_Produktion.Bis3,
tbl_Produktion.Pause3, tbl_Produktion.Arbeiter4,
tbl_Produktion.Von4, tbl_Produktion.Bis4, tbl_Produktion.Pause4,
tbl_Produktion.Leiharbeiter5, tbl_Produktion.Von5,
tbl_Produktion.Bis5, tbl_Produktion.Pause5,
tbl_Produktion.Leiharbeiter6, tbl_Produktion.Von6,
tbl_Produktion.Bis6, tbl_Produktion.Pause6, tbl_Produktion.Muster
FROM
tbl_Personal
INNER JOIN
tbl_Produktion ON tbl_Personal.PersID = tbl_Produktion.Schichtleiter
GROUP BY
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter;
It works when I group it by all the columns, but not like this.
The error message say that the rest of the columns aren't part of the aggregation-function (translated from german to english as best as I could).
PS.: I also need the sum of "tbl_Produktion.Stueckzahl_Prod" therefore I tried using the SUM function (couldn't try it yet).
Have you tried something along these lines?
SELECT
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter,
MAX(tbl_Produktion.ProduktionsID), MAX(tbl_Produktion.Linie),
MAX(tbl_Produktion.Schicht), MAX(tbl_Produktion.Anzahl_Schichten_P),
MAX(tbl_Produktion.Schichtteam), MAX(tbl_Produktion.Von), MAX(tbl_Produktion.Bis),
SUM(tbl_Produktion.Stueckzahl_Prod)
FROM
tbl_Personal
INNER JOIN
tbl_Produktion ON tbl_Personal.PersID = tbl_Produktion.Schichtleiter
GROUP BY
tbl_Produktion.Datum, tbl_Produktion.Schichtleiter;
I have used the MAX function for all the data except the two items you specify in the GROUP BY and the one where you desire the SUM. I took the liberty of leaving out mush of your data just to get started.
Using the MAX function turns out to be a convenient workaround when the data item is known to be unique within each group. We cannot know your data or your itent, so we cannot tell you whether MAX will yield the results you need.
If you use an aggregation function in the select clause, you must group by every column that you're selecting that's not an aggregation. If you don't want to do that for some reason (perhaps it changes the output of the aggregation in way that you don't intend) you either must think of an aggregate to use (pick a value. Average? Max? Min?) or just do two selects, one for the aggregate, and one for the non-aggregates. But, then, you have to decide how to get the non-aggregated fields that make sense for the aggregate (or show them all in a table, I suppose?)

GCP Bigquery - query empty values from a record type value

I'm trying to query all resources that has empty records on a specific column but I'm unable to make it work. Here's the query that I'm using:
SELECT
service.description,
project.labels,
cost AS cost
FROM
`xxxxxx.xxxxx.xxxx.xxxx`
WHERE
service.description = 'BigQuery' ;
Here's the results:
As you can see, I'm getting everything with that query, but as mentioned, I'm looking to get resources with empty records only for example record 229,230 so on.
Worth to mention that schema for the column I'm trying to query is:
project.labels RECORD REPEATED
The above was mentioned because I tried using several combinations of WHERE but everything ends up in error.
To identify empty repeated record - you can use ARRAY_LENGTH in WHERE clause like in below example
WHERE ARRAY_LENGTH(project.labels) = 0

Array Aggregation - Retrieving an entire row of data in BigQuery

We have used array aggregation method and loaded the data in BigQuery
Clarification :
Is it possible to retrieve the specific value in array aggregation method? What are the methods available for retrieving the data from the field which have multiple records?
Query Clarification
We tried to find out the value of all data from the particular field which has multiple values in the screenshots [image.png] using below query but we got an error.
Sample Query
select fv,product.productSKU,product.productVariant,product.productBrand
from dataset.tablename
where hn=9 and product.productBrand='Politix'
You should use UNNEST as in below example
#standardSQL
SELECT
fv,
product.productSKU,
product.productVariant,
product.productBrand
FROM `dataset.tablename`,
UNNEST(product) product
WHERE hn=9
AND product.productBrand='Politix'
You can also check Working with Arrays in Standard SQL

How to filter columns via MDX

Im new (couple of days to be exact) on Cubes, I have the following problem.
I have a Measure that brings certain amount of data, 100 rows for example. From that data I want to filter numbers that are < 0 from one of its columns.
For example this measure:
[Measures].[Distribution CSU Groups]
Will bring data like this
enter image description here
As you can see in the link, I want to filter those rows that have negative values on the 3rd column.
Is it possible to do this via MDX and how?
Yes filtering is possible in mdx via the following functions:
IIF function
FILTER function
HAVING clause

Projecting Grouped Tuples in Pig

I have a collection of tuples of the form (t,a,b) that I want to group by b in Pig. Once grouped, I want to filter out b from the tuples in each group and generate a bag of filtered tuples per group.
As an example, assume we have
(1,2,1)
(2,0,1)
(3,4,2)
(4,1,2)
(5,2,3)
The pig script would produce
{(1,2),(2,0)}
{(3,4),(4,1)}
{(5,2)}
The question is: how do I go about producing this result? I'm used to seeing examples where aggregation operations follow a group by operation. It's less clear to me how to filter the tuples and return them in a bag. Thanks for your assistance!
Turns out what I was looking for is the syntax for nested projection in Pig.
If one has tuples of the form (t,a,b) and wants to drop b after the group by, it is done this way.
grouped = GROUP tups BY b;
result = FOREACH grouped GENERATE tup.(t,a);
See the "Nested Projection" section on the PigLatin page. http://wiki.apache.org/pig/PigLatin