I am trying to count the total unique number of heating complaints in the Socrata NYC 311 service requests database: https://data.cityofnewyork.us/Social-Services/All-Heat-complaints-to-311/m5nm-vca4
Ideally I want to use the data to populate a map with unique complaints, as well as the number of complaints by each unique address. So far I have used the following query which only returns about 2 days of data:
http://data.cityofnewyork.us/resource/m5nm-vca4.json?$select=created_date,longitude,latitude,COUNT(*)&$group=created_date,longitude,latitude&$where=complaint_type=%27heating%27
Is there anyway to query the database for unique address across all dates and count them by total complaints?
Can you be a little more descriptive about what you're trying to get? An example of the output you're trying to get would be perfect.
I think we might be able to get what you're looking for, but we need to get the aggregation right.
Related
I'm writing a code to load data from Google Ads api to BigQuery table by using Cloud Functions, the process query a table called ad_group_ad but i'm struggling when trying to validate if there's duplicated rows at my destination.
By reading the docs I was expecting to find some attribute used to identifier a column or a group of columns that represents the table key. May this question seem obviously but i ain't having progress when trying to google this.
Is there a way to identifies if there's is duplicated rows? I'm not using any group by instruction when collecting, just a simple select like the example below:
SELECT
segments.ad_network_type,
campaign.name,
ad_group.name,
ad_group.id,
so
on,
and,
so,
forth
FROM ad_group_ad
WHERE segments.date = ?
The combination of ad ID and ad group ID is guaranteed to be unique.
However, as soon as you include segments in your select clause, you'll get multiple rows with the same IDs. So the combination of ad_group.id, ad.id plus whatever segment fields you need should be a candidate key.
I have a bunch of data and I want the output to display an average of all the data points but also the individual data points in subsequent columns. Ideally it would look something like this:
Compound | Subject | Avg datapoint | Datapoint Experiment 1 | Datapoint Exp 2 | ...
..........XYZ......|.....ABC....|............40...............|...............20..............................|...............60...............|......
..........TUV......|.....ABC....|............30...............|...............20..............................|...............40...............|......
..........TUV......|.....DEF....|............20...............|...............10..............................|...............30...............|......
One problem I'm running in to is that I get repetitive lines of information. Another is that I have some rows pulling in info that doesn't apply, such that some of the individual datapoints in, say, row 2 would have info from subject DEF when I only want it to have info from subject ABC.
I hope this makes sense! I'm currently using inner join with a ton of where qualifiers. I'm close but not quite there. Any help is appreciate and let me know if I can provide additional info to help you help me.
The SQL language has a very strict rule requiring you to know the exact number of columns for your result set in advance, before looking at any data in your tables.
Therefore, if this average is based off a known fixed number of columns, or if the number of potential columns is reasonably small, where you can manually setup placeholders, then this will be possible. The key search terms to learn how to do this is "conditional aggregation", where you may also need to join the table to itself for each field.
Otherwise, you will need to pivot and aggregate your data in your client code or reporting tool.
I using the laravel framework and i am trying to get the max date for each address and display it.But it doesn't work as i expected,with max it doesn't bring me back the correct row.
$locations = \DB::table('mesaurement')
->select(\DB::raw('mesaurement.ID as ID,mesaurement.Latitude,mesaurement.Longitute,mesaurement.Temperature as temp,mesaurement.Humidity,mesaurement.Pressure,mesaurement.address, mesaurement.created_at,max(created_at) as date'))
->orderby('date','desc')
->groupby('mesaurement.address')
->get();
I wanted with the max date to bring me back the right Temperature,Humidity etc. I grouped by address because i have a lot of the same addresses and i want them to be just one.
i am unaware of the framework but the query can be simple as below
select max(temp) from mesaurement
group by mesaurement.address;
more details with multiple columns in max can be found at below link
Multiple max values in a query
I have a query where I am querying for a specific word in a ticket's short description, in this case WAP. So my current query looks like
Select * FROM table1 T
WHERE T.short_description Like '%WAP%'
The problem is that this entirely only shows rows if WAP is present in the short description, and I lose important grand totals. For example if I have 20 airlines but only 10 of those airlines have WAP its telling me I only have 10 total airlines which is not correct.
How can I rewrite this query to solve my issue, is this where sub queries come to play? I am a beginner at SQL.
To add additional context, I am looking to see how many total aircraft have had tickets created due to WAP issues. In this example if I get back 5 issues I want to be able to divide it by total airlines (but using Tableau not necessarily in the SQL statement) so 5/20 would give me what I want, however I am currently getting 5/10 (or the restricted set that the LIKE statement is limiting me to)
Thanks
I'm hoping to build a Django query to my model that lets my filter change as the query progresses.
I have a model Activity that I'm querying for. Each object has a postal_code field and I'm querying for multiple zip codes stored in an array postal_codes_to_query across a date range. I'd like to ensure that I get an even spread of objects across each of the zip codes. My database has millions of Activities, so when I query with a limit, I only receive activities that match zip codes early on in postal_codes_to_query. My current query is below:
Activity.objects.filter(postal_code__in=postal_codes_to_query).filter(start_time_local__gte=startTime).filter(start_time_local__lte=endTime).order_by('start_time_local')[:10000]
If I'm searching for say 20 zip codes, Ideally I'd like to receive 10000 activities, with 500 activities for each zip code that I queried on.
Is this possible in Django? If not, is there some custom SQL I could write to achieve this? I'm using a Heroku Postgres database in case that matters.
You can't do this in a single query, either in Django nor (as far as I know) in SQL.
The best bet is simply to iterate through the list of zips, querying for max 500 in each one:
activities_by_zip = {}
for code in postal_codes_to_query:
activities = Activity.objects.filter(postal_code=code).filter(
start_time_local__gte=startTime).filter(
start_time_local__lte=endTime).order_by('start_time_local')[:500]
activities_by_zip[code] = activities
Of course, this is one query per zip, but I think that's the best you're going to do.