Pull out specific word from description, but not limit entire query to that result? - sql

I have a query where I am querying for a specific word in a ticket's short description, in this case WAP. So my current query looks like
Select * FROM table1 T
WHERE T.short_description Like '%WAP%'
The problem is that this entirely only shows rows if WAP is present in the short description, and I lose important grand totals. For example if I have 20 airlines but only 10 of those airlines have WAP its telling me I only have 10 total airlines which is not correct.
How can I rewrite this query to solve my issue, is this where sub queries come to play? I am a beginner at SQL.
To add additional context, I am looking to see how many total aircraft have had tickets created due to WAP issues. In this example if I get back 5 issues I want to be able to divide it by total airlines (but using Tableau not necessarily in the SQL statement) so 5/20 would give me what I want, however I am currently getting 5/10 (or the restricted set that the LIKE statement is limiting me to)
Thanks

Related

Different results BigQuery and Tableau

Hello guys when i try to calculate median and average in my BigQuery and Tableau i get different results even though I am using same numbers and rows. Is there something I should know?
For example;
In BiQuery
select district, avg(sales) from table name
In Tableau
Using district as dimension and select average of sales from the maxcard drop down menu.
Surprisingly the output from the two are not the same.
Any one knows what might be a problem that I should know?
Thanks!
Hi matt_black just got the solution, because in Tableau i made a join with polygon data when this happens it duplicates the data in the Tableau hence different results.

Reducing database load from consecutive queries

I have an application which calls the database multiple times to achieve one simple goal.
A little information about this application; In short, the application scrapes data from a webpage & stores specific information from this page into a database. The important information in this query is: Player name, Position. There can be multiple sitting at one specific position, kill points & Class
Player name has every potential to change or remain the same every day
Regarding the Position, there can be multiple sitting in one position
Kill points has the potential to increase or remain the same every day
Class, there is only 2 possibilities that a name can be, Ex: A can change to B or remain A (same in reverse), but cannot be C,D,E,F
The player name can change at any particular day, Position can also change dependent on the kill point increase from the last update which spins back around to the goal. This is to search the database day by day, from the current date to as far back as 2021-02-22 starting at the most recent entry for a player name and back track to the previous day to check if that player name is still the same or has changed.
What is being used as a main reference to the change is the kill points. As the days go on, this number will either be the exact same or increase, it can never decrease.
So now onto the implementation of this application.
The first query which runs finds the most recent entry for the player name
SELECT TOP(1) * FROM [changes] WHERE [CharacterName]=#charname AND [Territory]=#territory AND [Archived]=0 ORDER BY [Recorded] DESC
Then continue to check the previous days entries with the following query:
SELECT TOP(1) * FROM [changes] WHERE [Territory]=#territory AND [CharacterName]=#charname AND [Recorded]=#searchdate AND ([Class] LIKE '%{Class}%' OR [Class] LIKE '%{GetOpposite(Class)}%' AND [Archived]=0 )
If no results are found, will then proceed to find an alternative name with the following query:
SELECT TOP(5) * FROM [changes] WHERE [Kills] <= #kills AND [Recorded]='{Data.Recorded.AddDays(-1):yyyy-MM-dd}' AND [Territory]=#territory AND [Mode]=#mode AND ([Class] LIKE #original OR [Class] LIKE #opposite) AND [Archived]=0 ORDER BY [Kills] DESC
The aim of the query above is to get the top 5 entries that are the closest possible matches & Then cross references with the day ahead
SELECT COUNT(*) FROM [changes] WHERE [CharacterName]=#CharacterName AND [Territory]=#Territory AND [Recorded]=#SearchedDate AND [Archived]=0
So with checking the day ahead, if the character name is not found in the day ahead, then this is considered to be the old player name for this specific character, else after searching all 5 of the results and they are all found to be present in the day aheads searches, then this name is considered to be new to the table.
Now with the date this application started to run up to today's date which is over 400 individual queries on the database to achieve one goal.
It is also worth a noting that this table grows by 14,400 - 14,500 Rows each and every day.
The overall question to this specific? Is it possible to bring all these queries into less calls onto the database, reduce queries & improve performance?
What you can do to improve performance will be based on what parts of the application stack you can manipulate. Things to try:
Store Less Data - Database content retrieval speed is largely based on how well the database is ordered/normalized and just how much data needs to be searched for each query. Managing a cache of prior scraped pages and only storing data when there's been a change between the current scrape and the last one would guarantee less redundant requests to the db.
Separate specific classes of data - Separating data into dedicated tables would allow you to query a specific table for a specific character, etc... effectively removing one where clause.
Reduce time between queries - Less incoming concurrent requests means less resource contention and faster response times to prior requests.
Use another data structure - The only reason you're using top() is because you need data ordered in some specific way (most-recent, etc...). If you just used a code data structure that keeps the data ordered and still easily-query-able you could then perhaps offload some sql requests to this structure instead of the db.
The suggestions above are not exhaustive, but what you do to improve performance is largely a function of what in the application stack you have the ability to modify.

Tableau count values after a GROUP BY in SQL

I'm using Tableau to show some schools data.
My data structure gives a table that has all de school classes in the country. The thing is I need to count, for example, how many schools has Primary and Preschool (both).
A simplified version of my table should look like this:
In that table, if I want to know the number needed in the example, the result should be 1, because in only one school exists both Primary and Preschool.
I want to have a multiple filter in Tableau that gives me that information.
I was thinking in the SQL query that should be made and it needs a GROUP BY statement. An example of the consult is here in a fiddle: Database example query
In the SQL query I group by id all the schools that meet either one of the conditions inside de IN(...) and then count how many of them meet both (c=2).
Is there a way to do something like this in Tableau? Either using groups or sets, using advanced filters or programming a RAW SQL calculated fiel?
Thanks!
Dubafek
PS: I add a link to my question in Tableu's forum because you can download my testing workbook there: Tableu's forum question
I've solved the issue using LODs (specifically INCLUDE and EXCLUDE statements).
I created two calculated fields having the aggregation I needed:
Then I made a calculated field that leaves only the School IDs that matches the number of types they have (according with the filtering) with the number of types selected in the multiple filter (both of the fields shown above):
Finally, I used COUNTD([Condition]) to display the amounts of schools matching with at least the School types selected.
Hope this helps someone with similar issue.
PS: If someone wants the Workbook with the solution I've uploaded it in an answer in the Tableau Forum

Grand totals row not summing in Google Data Studio

Well, I'm absolute newbie in Google Data Studio, but for any reason, my grand totals rows is not working.
I'm learning to use this tool, and I made an easy table with just countries and sessions.
Piece of Cake. Now I just want to add a total row where it sums all sessions. That's all. I activated option Show Summary Row but it shows nothing.
Thing's I've done and not worked:
Update and refresh
Changed time period and tried different dates just in case.
Delete and create again full table.
Checked connection. I get data and the data is right, I just cannot sum it.
Changed size and format of table, just in case it where a problems or margins or font color.
And I know it can be done, because different sources. I've read this question here:
Grand Total is wrong in Google Data Studio
But it did not help. In that question, a user posted an image in the comments:
As you can see, he managed to get what I'm trying to do.
So I must be doing something wrong, and I do not why.
UPDATE 2: If I apply a filter, I get no totals. You can see my config in the right side of image.
Can anybody give me a clue of how to make a grand totals row in Google Data Studio?
Thanks
Sounds like a bug. It should be a case of selecting that tick box. Strangely, I looked at an existing table I have with totals and when I unticked the box and then ticked again, the totals didn't reappear and disappeared off another table on the page (like your example). They did reappear eventually with some refreshing of the data and page but seems like there's something wrong with them.
I don't think this is a bug I think it part of the design.
I actually just discovered the reason this is happening at least for me, it doesn't actually sum the values in the table, the grand total summary of a table is a sum of whatever the metric being used is not the actual rows shown in the chart. so if you have a dimension (like age / gender) where there is data thresholding applied internally by google but are using a metric such as users you will see the grand total from the metric value without the thresholding applied from the dimension.
Proof below
You can see the grand total for column 2 is not 953.6 its 453.6 and if i look at a non threshold dimension (country)
you can see where the 953.6 comes from since the data source supplied to the table uses 80% of all users 1192 * .8 give me 953.6 which is what the grand total is displaying. Conclusion, the only way this number could be possible is if, when using a threshold dimension for a table with metric there will be a discrepancy since the grand total value is not coming from the table values but rather from metric source data, which will not have the tables dimension applied for some odd reason.

SoQL query for unique values with Socrata API

I am trying to count the total unique number of heating complaints in the Socrata NYC 311 service requests database: https://data.cityofnewyork.us/Social-Services/All-Heat-complaints-to-311/m5nm-vca4
Ideally I want to use the data to populate a map with unique complaints, as well as the number of complaints by each unique address. So far I have used the following query which only returns about 2 days of data:
http://data.cityofnewyork.us/resource/m5nm-vca4.json?$select=created_date,longitude,latitude,COUNT(*)&$group=created_date,longitude,latitude&$where=complaint_type=%27heating%27
Is there anyway to query the database for unique address across all dates and count them by total complaints?
Can you be a little more descriptive about what you're trying to get? An example of the output you're trying to get would be perfect.
I think we might be able to get what you're looking for, but we need to get the aggregation right.