Error: The identifier that starts with ... Max length is 128 - sql

When I try to execute the SQL statement below when pointed to an SQL Server Express I get an error
The identifier that starts with 'Select * FROM AvailabilityBlocks LEFT JOIN Location ON AvailabilityBlocks.LocationID=Location.LocationID WHERE ((AvailabilityBlo' is too long. Maximum length is 128." Error.
SQL code:
SELECT *
FROM
(Resources
LEFT JOIN
[Select *
FROM AvailabilityBlocks
LEFT JOIN
Location ON AvailabilityBlocks.LocationID = Location.LocationID
WHERE
((AvailabilityBlocks.LocationID IN (8, 14, 16, 31, 1, 15, 17, 10, 9, 19, 12, 30, 5, 18, 13, 20, 3, 26, 2, 25, 28, 27, 32, 33)
AND (AvailabilityBlocks.Type = 3 OR AvailabilityBlocks.Type = 4))
OR AvailabilityBlocks.Type = 2)
AND [Begin] < '06-Apr-2015 12:00:00 AM'
AND [Begin] >= '30-Mar-2015 12:00:00 AM'
AND (WeekDay([Begin]) = 2 OR WeekDay([Begin]) = 3
OR WeekDay([Begin]) = 4 OR WeekDay([Begin]) = 5
OR WeekDay([Begin]) = 6 OR WeekDay([Begin]) = 7)]. AS FilteredTable ON Resources.ResourceID = FilteredTable.ResourceID)
LEFT JOIN
EmployeeTypes ON EmployeeTypes.TypeID = Resources.EmployeeType
ORDER BY
RClass, Resources.LastName ASC, Resources.FirstName ASC,
Resources.ResourceID ASC, [AvailabilityBlocks.Begin] ASC,
[AvailabilityBlocks.End] Desc, Location.SubType DESC
This SQL works fine when executed against an Access DB. Can anyone offer any suggestions?
Thanks in advance

You were using [ instead of ( for the named subquery. I fixed it in the below and added a little structure for readability.
SELECT * FROM
Resources
LEFT JOIN
(Select * FROM AvailabilityBlocks
LEFT JOIN Location ON AvailabilityBlocks.LocationID=Location.LocationID
WHERE ((AvailabilityBlocks.LocationID IN (8, 14, 16, 31, 1, 15, 17, 10, 9, 19, 12, 30, 5, 18, 13, 20, 3, 26, 2, 25, 28, 27, 32, 33)
AND (AvailabilityBlocks.Type = 3 OR AvailabilityBlocks.Type = 4)) OR AvailabilityBlocks.Type = 2)
AND [Begin] < '06-Apr-2015 12:00:00 AM'
And [Begin] >= '30-Mar-2015 12:00:00 AM'
AND (WeekDay([Begin])=2
OR WeekDay([Begin])=3
OR WeekDay([Begin])=4
OR WeekDay([Begin])=5
OR WeekDay([Begin])=6
OR WeekDay([Begin])=7)) AS FilteredTable
ON Resources.ResourceID=FilteredTable.ResourceID
LEFT JOIN EmployeeTypes ON EmployeeTypes.TypeID=Resources.EmployeeType
ORDER BY RClass, Resources.LastName ASC, Resources.FirstName ASC, Resources.ResourceID ASC, AvailabilityBlocks.[Begin] ASC, AvailabilityBlocks.[End] Desc, Location.SubType DESC

There a few things off with your query. I will give you what I think your query should look like. But I believe you need to also be able to figure out how to break up your query so as to figure out what it wrong
First thing first, make sure the inner query in your left join actually works :
SELECT ab.* -- Note Here I only select ab.*, if you need any columns from Location then do loc. "column needed"
FROM AvailabilityBlocks ab
LEFT JOIN Location loc ON ab.LocationID = loc.LocationID
WHERE
((ab.LocationID IN (8, 14, 16, 31, 1, 15, 17, 10, 9, 19, 12, 30, 5, 18, 13, 20, 3, 26, 2, 25, 28, 27, 32, 33)
AND (ab.Type = 3 OR ab.Type = 4))
OR ab.Type = 2)
AND ab.Begin BETWEEN CAST('20150330' AS DATE) AND CAST('20150406' AS DATE)
AND (WeekDay([Begin]) IN ( 2,3,4,5,6,7)
Now once you verify that the upper query works, you can embed that inner working query into your left join (I assume that most of the columns are within availability blocks if ResourceId is not then add it via Loc.ResourceID).
SELECT r.*,et.* , FilteredTable.Begin, FilteredTable.End, FilteredTable.SubType
FROM Resources r
LEFT JOIN
(
SELECT ab.*
FROM AvailabilityBlocks ab
LEFT JOIN Location loc ON ab.LocationID = loc.LocationID
WHERE
((ab.LocationID IN (8, 14, 16, 31, 1, 15, 17, 10, 9, 19, 12, 30, 5, 18, 13, 20, 3, 26, 2, 25, 28, 27, 32, 33)
AND (ab.Type = 3 OR ab.Type = 4))
OR ab.Type = 2)
AND ab.Begin BETWEEN CAST('20150330' AS DATE) AND CAST('20150406' AS DATE)
AND (WeekDay([Begin]) IN ( 2,3,4,5,6,7)
)AS FilteredTable ON r.ResourceID = FilteredTable.ResourceID
LEFT JOIN EmployeeTypes et ON et.TypeID = r.EmployeeType
Lastly within the "order by" clause use the appropriate alias rather than table names
ORDER BY
RClass, -- What table has this?
r.LastName ASC,
r.FirstName ASC,
r.ResourceID ASC,
FilteredTable.Begin ASC,
FilteredTable.End Desc,
FilteredTable.SubType DESC
Being able to break out your queries to see what is causing you problems is key to figuring out what other problems exist (meaning don't expect peoples' solutions to give you the answer, but rather guide you to figuring out the solution on your own)

Related

Discrepancy between two similar presto SQL queries

I have the following SQL queries that should written similar results wrt offset column, but dont.
Query 1:
SELECT visitor_id, array_agg(timestamp) as time, array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
Query 2:
SELECT visitor_id ,array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
Here uri_args is simply a json file which under the key 'offset' contains the value of the offset for the particular API response. This is from response log table of a server.
Although the two queries are similar and according to me ought to return the same values in the offset_list column i find the following discrepancy :
I will consider a particular visitor_id to convey it clearly, for a visitor_id ='12345' query i returns the following row in the offset_list col
[0, 0, 0, 10, 0, 10, 20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 57, 77, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41]
and for query 2 the output is as follows :
[20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 77, 57, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41, 0, 0, 0, 10, 0, 10]
I can observe the the two are circular permutations of each other but fail to see why this is happening. Please help me understand what the difference is the inner working of each query. The first reply suits the intent of my quest which is to capture the visitors journey on the homepage.
If you need your array elements deterministically ordered specify order by clause for aggregate function as mentioned in the docs
Some aggregate functions such as array_agg() produce different results depending on the order of input values. This ordering can be specified by writing an ORDER BY Clause within the aggregate function
SELECT visitor_id ,array_agg(offset order by timestamp) as offset_list
-- ...

Extract last item from JSON in Cell

I have a a column called submission_date with json cells that looks like this:
{"submitted":["January 24, 2019","January 25, 2019","January 30,
2019","February 27, 2019"],"submission_canceled":["January 24,
2019","January 25, 2019"],"returned":"February 19, 2019"}
or like this:
{"submitted":["February 27, 2019","March 5, 2019"],"submission_canceled":"March 5, 2019"}
I can easily get the first result from the "submission_canceled" field by doing:
json_extract(submission_date, "$.submission_canceled[0]")
I would think if I wanted to last value I would do:
json_extract(submission_date, "$.submission_canceled[-1]")
But that is just giving me back a null. As you can see, sometimes the submission_canceled field will have multiple dates in a list and other times it will just have a single date, not in a list. I'd like to get the single item or the last item in the list from the submission_canceled section.
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '{"submitted":["January 24, 2019","January 25, 2019","January 30, 2019","February 27, 2019"],"submission_canceled":["January 24, 2019","January 25, 2019"],"returned":"February 19, 2019"}' submission_date UNION ALL
SELECT 2, '{"submitted":["February 27, 2019","March 5, 2019"],"submission_canceled":"March 5, 2019"}'
)
SELECT id, REGEXP_REPLACE(ARRAY_REVERSE(SPLIT(JSON_EXTRACT(submission_date, '$.submission_canceled'), '","'))[OFFSET(0)], r'"|\[|\]', '') last_submission_canceled
FROM `project.dataset.table`
with result
Row id last_submission_canceled
1 1 January 25, 2019
2 2 March 5, 2019
Update - below is "lighter" version
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '{"submitted":["January 24, 2019","January 25, 2019","January 30, 2019","February 27, 2019"],"submission_canceled":["January 24, 2019","January 25, 2019"],"returned":"February 19, 2019"}' submission_date UNION ALL
SELECT 2, '{"submitted":["February 27, 2019","March 5, 2019"],"submission_canceled":"March 5, 2019"}'
)
SELECT id, REGEXP_EXTRACT(JSON_EXTRACT(submission_date, '$.submission_canceled'), r'"([^"]*)"\]?$') last_submission_canceled
FROM `project.dataset.table`
with obviously same result
Row id last_submission_canceled
1 1 January 25, 2019
2 2 March 5, 2019

BigQuery - Group By with multiple fields extremely slow

I'm trying to group by multiple fields, such as Dates that spans through a few years with unique days (so 5*365 days maximum), and some unique IDs (a few thousand I believe).
The query is pretty simple:
SELECT
cs.CriterionId,
cs.AdGroupId,
cs.CampaignId,
cs.Date,
SUM(cs.Impressions) AS Sum_Impressions,
SUM(cs.Clicks) AS Sum_Clicks,
SUM(cs.Interactions) AS Sum_Interactions,
(SUM(cs.Cost) / 1000000) AS Sum_Cost,
SUM(cs.Conversions) AS Sum_Conversions,
cs.AdNetworkType1,
cs.AdNetworkType2,
cs.AveragePosition,
cs.Device,
cs.InteractionTypes
FROM
`adwords.Keyword_{customer_id}` c
LEFT JOIN
`adwords.KeywordBasicStats_{customer_id}` cs
ON
c.ExternalCustomerId = cs.ExternalCustomerId
WHERE
c._DATA_DATE = c._LATEST_DATE
AND c.ExternalCustomerId = {customer_id}
GROUP BY
1, 2, 3, 4, 10, 11, 12, 13, 14
ORDER BY
1, 2, 3, 4, 10, 11, 12, 13, 14
The keywordBasicStats table has around 700MB of data, and Keyword has around 50MB, and it's running for around a few hours now.
Not sure if there's a way to optimize this SQL query.
If anyone at Google is interested, the Job ID is:
blissful-land-197118:US.bquijob_668c014c_164b8710acc
Try this one(maybe some fix is required due to your columns datatypes):
SELECT
cs.CriterionId,
cs.AdGroupId,
cs.CampaignId,
cs.Date,
SUM(cs.Impressions) AS Sum_Impressions,
SUM(cs.Clicks) AS Sum_Clicks,
SUM(cs.Interactions) AS Sum_Interactions,
(SUM(cs.Cost) / 1000000) AS Sum_Cost,
SUM(cs.Conversions) AS Sum_Conversions,
cs.AdNetworkType1,
cs.AdNetworkType2,
cs.AveragePosition,
cs.Device,
cs.InteractionTypes
FROM
`adwords.Keyword_{customer_id}` c
INNER JOIN
`adwords.KeywordBasicStats_{customer_id}` cs
ON
c.ExternalCustomerId = cs.ExternalCustomerId
WHERE
c._DATA_DATE = c._LATEST_DATE
AND c.ExternalCustomerId = {customer_id}
GROUP BY
1, 2, 3, 4, 10, 11, 12, 13, 14
UNION ALL
SELECT
cs.CriterionId,
cs.AdGroupId,
cs.CampaignId,
cs.Date,
0.0 AS Sum_Impressions,
0.0 AS Sum_Clicks,
0.0 AS Sum_Interactions,
0.0 AS Sum_Cost,
0.0 AS Sum_Conversions,
cs.AdNetworkType1,
cs.AdNetworkType2,
cs.AveragePosition,
cs.Device,
cs.InteractionTypes
FROM
`adwords.Keyword_{customer_id}` c
LEFT JOIN
`adwords.KeywordBasicStats_{customer_id}` cs
ON
c.ExternalCustomerId = cs.ExternalCustomerId
WHERE cs.ExternalCustomerId IS NULL
c._DATA_DATE = c._LATEST_DATE
AND c.ExternalCustomerId = {customer_id}
GROUP BY
1, 2, 3, 4, 10, 11, 12, 13, 14
ORDER BY
1, 2, 3, 4, 10, 11, 12, 13, 14
I think, what makes this query extremely slow is ORDER BY
Just remove it and try again

Reuse query value in the same query

Gonna' try to make this quick... Query below.
SELECT PriorityDefID, MilestoneDefID, MilestoneName, ContactName,
IIF(PriorityDefID = 1, (SELECT BonusDaysFH FROM milestone_def WHERE (( MilestoneDefID = IIF(MilestoneDefID = 5, 5, IIF(MilestoneDefID = 6, 6, IIF(MilestoneDefID = 7, 7))) )) ),
IIF(PriorityDefID = 2, (SELECT BonusDaysFM FROM milestone_def WHERE (( MilestoneDefID = IIF(MilestoneDefID = 5, 5, IIF(MilestoneDefID = 6, 6, IIF(MilestoneDefID = 7, 7))) )) ),
IIF(PriorityDefID = 3, (SELECT BonusDaysFL FROM milestone_def WHERE (( MilestoneDefID = IIF(MilestoneDefID = 5, 5, IIF(MilestoneDefID = 6, 6, IIF(MilestoneDefID = 7, 7))) )) ) ))) AS BonusDaysAllotted,
StartDate, EndDate
FROM GetPerformance
WHERE (((MilestoneDefID) = 5 Or (MilestoneDefID) = 6 Or (MilestoneDefID) = 7));
I am ultimately trying to get the value of the MilestoneDefID and reuse it in the subquery to determine which BonusDays column to return. The subquery wants to return three rows with the results of passing each value of 5, 6 and 7. For each row returned from the GetPerformance query, I want it to take the MilestoneDefID from that row and then go into the subquery and pass that MilestoneDefID to return the correct number of BonusDays.
I say use union in this query.
select a.PriorityDefID, a.MilestoneDefID, a.MilestoneName, a.ContactName,
b.BonusDaysFH as BonusDaysAllotted, a.StartDate, a.EndDate
from GetPerformance a, milestone_def b
where ((a.MilestoneDefID=5) or (a.MilestoneDefID=6) or (a.MilestoneDefID=7))
and b.MilestoneDefID=a.MilestoneDefID
and a.PriorityDefID=1
union
select a.PriorityDefID, a.MilestoneDefID, a.MilestoneName, a.ContactName,
b.BonusDaysFM as BonusDaysAllotted, a.StartDate, a.EndDate
from GetPerformance a, milestone_def b
where ((a.MilestoneDefID=5) or (a.MilestoneDefID=6) or (a.MilestoneDefID=7))
and b.MilestoneDefID=a.MilestoneDefID
and a.PriorityDefID=2
union
select a.PriorityDefID, a.MilestoneDefID, a.MilestoneName, a.ContactName,
b.BonusDaysFL as BonusDaysAllotted, a.StartDate, a.EndDate
from GetPerformance a, milestone_def b
where ((a.MilestoneDefID=5) or (a.MilestoneDefID=6) or (a.MilestoneDefID=7))
and b.MilestoneDefID=a.MilestoneDefID
and a.PriorityDefID=3
Sadly, this will make three queries, but I believe the lack of Iif's will improve performance.

SQL SELECT order by 2 columns and group by

Here are the RS return and the SQL issued,
SELECT *, (UNIX_TIMESTAMP(end_time) - UNIX_TIMESTAMP(start_time)) AS T
FROM games
WHERE game_status > 10
ORDER BY status, T;
game_id, player_id, start_time, end_time, score, game_status, is_enabled, T
65, 22, '2009-09-11 17:50:35', '2009-09-11 18:03:07', 17, 11, 1, 752
73, 18, '2009-09-11 18:55:07', '2009-09-11 19:09:07', 30, 11, 1, 840
68, 20, '2009-09-11 18:03:08', '2009-09-11 18:21:52', 48, 11, 1, 1124
35, 18, '2009-09-11 15:46:05', '2009-09-11 16:25:10', 80, 11, 1, 2345
13, 8, '2009-09-11 12:33:31', '2009-09-11 15:21:11', 40, 11, 1, 10060
11, 5, '2009-09-11 12:22:34', '2009-09-11 15:21:42', 55, 11, 1, 10748
34, 17, '2009-09-11 15:45:43', '2009-09-11 21:00:45', 49, 11, 1, 18902
2, 1, '2009-09-10 20:46:59', '2009-09-11 23:45:21', 3, 11, 1, 97102
84, 1, '2009-09-11 23:51:29', '2009-09-11 23:51:42', 10, 12, 1, 13
I 'd like to group by player_id, (i.e. take the best result each Player_id, it's determined by "game_status - the min", and the time T,
so I added a group by clause, but it doesn't return the min
SELECT *, (UNIX_TIMESTAMP(end_time) - UNIX_TIMESTAMP(start_time)) AS T
FROM games
WHERE game_status > 10
GROUP BY player_id
ORDER BY game_status, T;
35, 18, '2009-09-11 15:46:05', '2009-09-11 16:25:10', 80, 11, 1, 2345
13, 8, '2009-09-11 12:33:31', '2009-09-11 15:21:11', 40, 11, 1, 10060
34, 17, '2009-09-11 15:45:43', '2009-09-11 21:00:45', 49, 11, 1, 18902
1, 1, '2009-09-10 20:39:44', '2009-09-10 20:41:21', 10, 12, 1, 97
24, 12, '2009-09-11 14:46:06', '2009-09-11 14:53:30', 10, 12, 1, 444
5, 3, '2009-09-11 10:56:22', '2009-09-11 11:13:01', 11, 12, 1, 999
37, 20, '2009-09-11 15:51:13', '2009-09-11 16:15:04', 14, 12, 1, 1431
79, 31, '2009-09-11 20:34:17', '2009-09-11 20:43:29', 4, 13, 1, 552
18, 9, '2009-09-11 13:09:47', '2009-09-11 18:33:10', 2, 13, 1, 19403
72, 30, '2009-09-11 18:46:29', '2009-09-11 18:48:44', 0, 14, 1, 135
40, 22, '2009-09-11 16:12:39', '2009-09-11 16:18:23', 3, 14, 1, 344
8, 5, '2009-09-11 12:15:54', '2009-09-11 12:21:48', 25, 14, 1, 354
85, 33, '2009-09-12 01:14:01', '2009-09-12 01:20:43', 0, 14, 1, 402
22, 11, '2009-09-11 13:50:41', '2009-09-11 13:57:24', 7, 14, 1, 403
SELECT *, min(UNIX_TIMESTAMP(end_time) - UNIX_TIMESTAMP(start_time)) AS T
FROM games
WHERE game_status > 10
GROUP BY player_id
ORDER BY game_status, T;
If I select min(T), it doesn't return the min row, but the min value on the hold column.
I'd searched for some method with self-join, say, http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
The subquery SELECT for min(), but I can't issue two min() on two columns as it doesn't return the specific rows I want.
select type, min(price) as minprice
from fruits
group by type;
I hope there's a way as a filter on the first SQL to remove the duplicated player_id rows.
From what I can gather, you want to see what the minimum time was on the highest game_status for a given player_id, game_id combination. Try this:
select
g1.game_id,
g1.player_id,
min(UNIX_TIMESTAMP(g1.end_time) - UNIX_TIMESTAMP(g1.start_time)) as t,
g1.game_status
from
games g1
inner join (select game_id, player_id, max(game_status) as max_status
from games where game_status > 10) g2 on
g1.game_id = g2.game_id
and g1.player_id = g2.player_id
and g1.game_status = g2.max_status
group by
g1.game_id,
g1.player_id,
g1.game_status
order by
g1.player_id,
g1.game_id,
g1.game_status,
T
It looks like you're missing the MIN function and a slight change to your filtering clause.
As in:
SELECT *, MIN(UNIX_TIMESTAMP(end_time) - UNIX_TIMESTAMP(start_time)) AS T
FROM games
GROUP BY player_id
HAVING MIN(UNIX_TIMESTAMP(end_time) - UNIX_TIMESTAMP(start_time)) > 10
ORDER BY game_status, T;
I moved the "> 10" logic because I believe your intent is to filter out those players whose best game status is less than ten. This is a different criteria than filtering out any individual game status entries that are less than ten (which is what you were doing via the WHERE clause).
Try it out. It looks like you're using MySQL, which is not a database system I am all that familiar with.
I'm a little unsure of some phrases in your question, but you need to do a nested SELECT operation along the following lines:
SELECT g.*
FROM (SELECT *,
(UNIX_TIMESTAMP(g.end_time) - UNIX_TIMESTAMP(g.start_time)) AS t
FROM games
) AS g
JOIN (SELECT player_id,
MIN(UNIX_TIMESTAMP(end_time) -
UNIX_TIMESTAMP(start_time)) AS min_t
FROM games
WHERE game_status > 10
GROUP BY player_id
) AS r
ON g.player_id = r.player_id AND g.t = r.min_t
ORDER BY game_status, g.t;
The 'r' query returns the player ID and the corresponding minimum time for that player; that is joined with the main table fetching all the rows for that player with the same minimum time. Generally, that will be one entry, but if someone has two games with the same time, the query will return both.
I'm not clear if there's another way of disambiguating the results set; there might be.
Thanks for the replies.
I am looking for Eric and Jonathan 's solution.
Let me explain in detail.
As Eric mentioned, I am seeking for the result from game_status and min time(T),
I only need the status > 10 , and ranking from smaller,
(i.e. 11 > 12 > 13 > 14, only four status) and determine from their time.
I've taken the top 5 rows of player_id = 18 from the table:
SELECT *, (UNIX_TIMESTAMP(end_time) - UNIX_TIMESTAMP(start_time)) AS T
FROM games where player_id = 18 order by game_status, T;
game_id, player_id, start_time, end_time, score, game_status, is_enabled, T
73, 18, '2009-09-11 18:55:07', '2009-09-11 19:09:07', 30, 11, 1, 840
35, 18, '2009-09-11 15:46:05', '2009-09-11 16:25:10', 80, 11, 1, 2345
53, 18, '2009-09-11 16:57:30', '2009-09-11 16:58:28', 0, 14, 1, 58
59, 18, '2009-09-11 17:27:42', '2009-09-11 17:28:51', 0, 14, 1, 69
57, 18, '2009-09-11 17:24:25', '2009-09-11 17:25:41', 0, 14, 1, 76
Player 18 played many times of the game. He got different results(game_status).
Now, we are taking the best result on each of the players.
Obviously, the best result for 18 is
73, 18, '2009-09-11 18:55:07', '2009-09-11 19:09:07', 30, 11, 1, 840
As the status is 11, and time is 840.
Note that the best time he took was game_id = 53(Line 3 above), We won't take this result as the status was 14. Hence, use min(UnixTimeSTAMP ... ) won't help as it take 58 as result set.