I am wondering if there is a simple / smart way to pass a query to a Postgresql database. I have a database whose headers look something like this:
measurementPointID | parameterA | parameterB | measurement | measurementTIME
There are some dozens of records within the database.
I would like to pass a query that retrieves data only for a set of measurementPointID's. There are several dozens of thousands of measurementPointID's values that I need to retrieve and I have all of these available in, for example, an CSV file.
The query should do a GROUP BY measurementTIME and ORDER BY measurementTIME as well. One detail is that if the measurement is zero (measurement = 0) there is no row corresponding to the measurementPointID at all.
Am I trying to do something too complicated or in a stupid way?
Related
I hope I explain this adequately.
I have a series of Google Sheets with data from an Airtable database. Several of the fields are stringified arrays with recordIds to another table.
These fields can have between 0 and n - comma separated values.
I run a create/overwrite table SELECT statement to create native BigQuery tables for reporting. This works great.
Now I need to add the recordIds to a Repeated field.
I've manually written to a repeated field using:
INSERT INTO `robotic-vista-339622.Insurly_dataset.zzPOLICYTEST` (policyID, locations, carrier)
VALUES ('12334556',[STRUCT('recordId1'),STRUCT('recordId2')], 'name of policy');
However, I need to know how I to do this using SELECT statement rather than INSERT. I also need to know how to do this if you do not know the number of recordIds that have been retrieved from Airtable. One record could have none and another record could have 10 or more.
Any given sheet will look like the following, where "locations" contains the recordIds I want to add to a repeated field.
SHEETNAME: POLICIES
|policyId |carrier | locations |
|-----------|-----------|---------------------------------|
|recrTkk |Workman's | |
|rec45Yui |Workman's |recL45x32,recQz70,recPrjE3x |
|recQb17y |ABC Co. |rec5yUlt,recIrW34 |
In the above, the first row/record has no location Id's. And then three and two on the subsequent rows/records.
Any help is appreciated.
Thanks.
I'm unsure if answering my own question is the correct way to show that it was solved... but here is what it took.
I create a Native table in BigQuery. the field for locations is a string, mode repeated.
Then I just run an overwrite table SELECT statement.
SELECT recordId,Name, Amount, SPLIT(locations) as locations FROM `projectid.datasetid.googlesheetsdatatable`;
Tested and I run linked queries on the locations with unnest.
What's the best way to get data that matches any one of ~100k values?
For this question, I'm using an Amazon Redshift database and have a table something like this with hundreds of millions of rows:
--------------------
| userID | c1 | c2 |
| 101000 | 12 | 'a'|
| 101002 | 25 | 'b'|
____________________
There are also millions of unique userIDs. I have a CSV list of 98,000 userIDs that I care about, and I want to do math on the columns for those specific users.
select c1, c2 from table where userID in (10101, 10102, ...)
What's the best solution to match against a giant list like this?
My approach was to make a python script that read in the result of all users in our condition set, then filtering against the CSV in python. It was dead slow and wouldn't work in all scenarios though.
A coworker suggested uploading the 98k users into a temporary table, then joining against in in the query. This seems like the smartest way, but I wanted to ask if you all had ideas.
I also wondered if printing an insanely long SQL query containing all 98k users to match against and running it would work. Out of curiosity, would that even have ran?
As your coworker suggests, put your IDs into a temporary table by uploading a CSV to S3 and then using COPY to import the file into a table. You can then use an INNER JOIN condition to filter your main data table on the list of IDs you're interested in.
An alternative option, if uploading a file to S3 isn't possible for you, could be to use CREATE TEMP TABLE to set up a table for your list of IDs and then use a spreadsheet to generate a whole of INSERT statements to populate the temp table. 100K of inserts could be quite slow though.
I want to user Google Bigquery to store number of searches for certain keyword in my site. I create table structure like this:
| date | keyword | number_of_searches |
| 2017-03-29 | pizza | 1 |
I want to increment number_of_searches value if combination of date and keyword already exists.
So you wanted a solution to store number of searches for certain keyword.
Using BigQuery technology you need to change some approach.
Let's discuss traditional steps:
- use a SELECT to find out if there is a row for today
- if not, then INSERT one with default
- when exists, use an UPDATE statement to increment counter
With BigQuery where the main change is that it's append only and it's not suitable to do UPDATE statements, you need to change how you analyse data and simplify the collection. Instead of doing 3 steps like above you would do one:
- insert a new row for each search
This way you will end up with multiple rows, and you can aggregate to find out the value and can run a query to find out how many searches you had for your queries, it would be something like this:
SELECT
myday as date,
keyword,
count(1) as number_of_search
FROM table
group by 1,2
I'm currently developing an application for a client using Visual Basic .NET. It's a rewrite of an application that accessed an Oracle database, filtered the columns and performed some actions on the data. Now, for reasons beyond my control, the client wants to use an Access (.mdb) database for the new application. The problem with this is that the tables have more than the 255 columns access supports so the client suggested splitting the data into multiple databases/tables.
Well even when the tables are split, at some point, I have to query all columns simultaneously (I did an INNER JOIN on both tables) which, of course, yields an error. The limit apparently is on number of simultaneously queryable columns not on the total number of columns.
Is there a possiblility to circumvent the 255 columns limit somehow? I was thinking in the direction of using LINQ to combine queries of both tables, i.e. have an adapter that emulates a single table I can perform queries on. A drawback of this is that .mdb is not a first-class citizen of LINQ-to-SQL (i.e. no insert/update supported etc.).
As a workaround, I might be able to rewrite my stuff so as to only need all columns at one point (I dynamically create control elements depending on the column names in the table). Therefore I would need to query say the first 250 columns and after that the following 150.
Is there a Access-SQL query that can achieve something like this. I thought of something like SELECT TOP 255 * FROM dbname or SELECT * FROM dbname LIMIT 1,250 but these are not valid.
Do I have other options?
Thanks a lot for your suggestions.
The ADO.NET DataTable object has no real limitations on the number of columns that it could contain.
So, once you have splitted the big table in two tables and set the same primary key in both subtables with less columns, you can use, on the VB.NET side, the DataTable.Merge method.
In their example on MSDN they show two tables with the same schema merged together, but it works also if you have two totally different schemas, but just the Primary key in common
Dim firstPart As DataTable = LoadFirstTable()
Dim secondPart As DataTable = LoadSecondTable()
firstPart.Merge(secondPart)
I have tested this just with only one column of difference, so I am not very sure that this is a viable solution in terms of performance.
As I know there is no way to directly bypass this problem using Access.
If you cannot change the db your only way I can think of is to make a wrapper that understand you're were the field are, automatically splits the query in more queryes and then regroup it in a custom class containing all the columns for every row.
For example you can split every table in more tables duplicating the field you're making the conditions on.
TABLEA
Id | ConditionFieldOne | ConditionFierldTwo | Data1 | Data2 | ... | Data N |
in
TABLEA_1
Id | ConditionFieldOne | ConditionFieldTwo | Data1 | Data2 | ... | DataN/2 |
TABLEA_2
Id | ConditionFieldOne | ConditionFieldTwo | Data(N/2)+1 | Data(n/2)+2 | ... | DataN |
and a query where is
SELECT * FROM TABLEA WHERE CONDITION1 = 'condition'
become with the wrapper
SELECT * FROM TABLEA_1 WHERE ConditionFieldOne = 'condition'
SELECT * FROM TABLEA_2 WHERE ConditionFieldOne = 'condition'
and then join the results.
I built a MS Access Database that takes a survey to create a custom report. The survey application that was used does not give us the reports we need. I usually grab the data (excel) and import it in access and build report the way we need them.
For this first time, we have people redoing the survey because they are updating something or they forgot to add something. I need to be able to grab the most recent surveys data so we don't get a duplicate when we run the report. (My main report is composed of several subreports. Some subreports will not visible if null, and any questions not answered are hidden and shrinked to prevent bulky reports with unnecessary whitespace.)
record ID (PK) | FName | LName | IDNum | Completed
1 | Bob | Smith | 57 | 3/31/2013 5:00pm
2 | Bob | Smith | 57 | 3/31/2013 7:00pm
I want record ID 2 or the one that was completed at 7pm.
The queries and reports are already completed so i have been trying to find a way to add a line of code in the criteria line of my query to grab the most recent record if the IDnum matches with more than one record.
I have been trying to find the best way to do it for the past several hours. I don't think that having my table be modified to 'table without duplicates' as after the database is complete, someone less technical will be using it. All they are going to do is import a new excel file to overwrite the table and the queries do everything to build the report. I don't want to manually delete the duplicate records either.
I know I need to do something along the lines with
IIF(count(IDNum)>1, *something, *something)
*I get stuck on the true and false part. How do i tell access that it needs to check within the table again to find the record with the larger primary key?
I thought this was going to be easy but i guess i was wrong. lol
I am fairly new at MS Access so I know I am not using the full potential and i might be going at this at the wrong angle. Any advice would be appreciated greatly.
I'm a student going into Info Systems, so i would really like to learn how to do this.
I believe the query you are looking for is
SELECT t1.*
FROM YourTable t1 INNER JOIN
(SELECT IDNum, MAX(Completed) AS MaxOfCompleted
FROM YourTable GROUP BY IDNum
) t2
ON t1.IDNum = t2.IDNum AND t1.Completed = t2.MaxOfCompleted;
When you are using an if function it should be iif not iff.
I'd recommend a correlated subquery, such as the following:
SELECT
Data.RecordID
, Data.FName
, Data.LName
, Data.IDNum
, Data.Completed
FROM
Data
WHERE
Data.Completed IN
(
SELECT TOP 1
DataSQ.Completed
FROM
Data as DataSQ
WHERE
DataSQ.IDNum = Data.IDNum
GROUP BY
DataSQ.Completed
ORDER BY
DataSQ.Completed DESC
)
GROUP BY
Data.RecordID
, Data.FName
, Data.LName
, Data.IDNum
, Data.Completed
;
Explanation
Instead of using a function such as Max or IIF, you can embed another SELECT query within the WHERE clause of your main query. The nested query is used to determine the most recent Completed date for every IDNum. Unlike selecting the most recent survey directly from your table with SELECT TOP 1 + ORDER BY, which would only return one record, the WHERE clause in your nested query refers back to the main query and produces a result for each IDNum. This is known as the Top N per Group pattern, and I've found it to be very useful. Note that in the nested query you will need to use a table name alias so that Access will be able to differentiate between the two queries.
Also, I'd generally recommend against trying to use a table PK to perform sorts. There are many cases when the PK order value will not be a good indicator of the values of related fields.
This code worked when tested on dummy data - best of luck!