I am a newbie developer for web server side. I have develop an app for shop to manage their orders. Here comes a question.
I have an order table like:
orderId,orderNumber …
orderProduct table like
orderProductId, productId, productNumber, productName, productDescription.
I have a search function, that get all orders by searchString.
The api is like
Get /api/orders?productNumberSearch={searchStr}&productNameSearch={searchStr2}&productDescriptionSearch={searchStr3}
My Impl is like
String queryStr1 = getParameterFromRequestWithDefault(“productNumberSearch”,”");
String queryStr2 = getParameterFromRequestWithDefault(“productNameSearch”,”");
String queryStr3 = getParameterFromRequestWithDefault(“productDescriptionSearch”,”");
List<OrderProduct> orderProducts = getAllOrderProductsFromDatabase();
List<Interger> filterOrderIds = orderProducts.stream().filter(item->{
return item.getName().contains(queryStr1) && item.getNumber().contains(queryStr2) && item.getDescription().contains(queryStr3)
}).collect(Collectors.toList());
List<Order> orders = getOrdersByIds(filterOrderIds);
I use spring mvc and mysql. Codes above works. However, if there are many requests arriving at the same time, out of memory exception will be thrown. Case there are Chinese character in database , mysql full text search does not work well?
So is there other way to implement the search function without elasticsearch
Related
I have written an application to scrape a huge set of reviews. For each review i store the review itself Review_Table(User_Id, Trail_Id, Rating), the Username (Id, Username, UserLink) and the Trail which is build previously in the code (Id, ...60 other attributes)
for(Element card: reviewCards){
String userName = card.select("expression").text();
String userLink = card.select("expression").attr("href");
String userRatingString = card.select("expression").attr("aria-label");
Double userRating;
if(userRatingString.equals("NaN Stars")){
userRating = 0.0;
}else {
userRating = Double.parseDouble(userRatingString.replaceAll("[^0-9.]", ""));
}
User u;
Rating r;
//probably this is the bottleneck
if(userService.getByUserLink(userLink)!=null){
u = new User(userName, userLink, new HashSet<Rating>());
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}else {
u = userService.getByUserLink(userLink);
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}
i = i +1;
ratingSet.add(r);
userSet.add(u);
}
saveToDb(userSet, t, link, ratingSet);
savedEntities = savedEntities + 1;
log.info(savedEntities + " Saved Entities");
}
The code works fine for small-medium sized dataset but i encounter a huge bottleneck for larger datasets. Let's suppose i have 13K user entities already stored in the PostgresDB and another batch of 8500 reviews comes to be scraped, i have to check for every review if the user of that review is already stored. This is taking forever
I tried to define and index on the UserLink attribute in Postgres but the speed didn't improve at all
I tried to take and collect all the users stored in the Db inside a set and use the contains method to check if a particular user already exists in the set (in this way I thought I could bypass the database bottleneck of 8k write and read but in a risky way because if the users inside the db table were too much i would have encountered a memory overflow). The speed, again, didn't improve
At this point I don't have any other idea to improve this
Well for one, you would certainly benefit from not querying for each user individually in a loop. What you can do is query & cache for only the UserLink or UserName meaning get & cache the complete set of only one of them because that's what you seem to need to differentiate in the if-else.
You can actually query for individual fields with Spring Data JPA #Query either directly or even with Spring Data JPA Projections to query subset of fields if needed and cache & use them for the lookup. If you think the users could run into millions or billions then you could think of using a distributed cache like Apache Ignite where your collection could scale easily.
Btw, the if-else seem to be inversed is it not?
Next you don't store each review individually which the above code appears to imply. You can write in batches. Also since you are using Postgres you can use Postgres CopyManager provided by Postgres for bulk data transfer by using it with Spring Data Custom repositories. So you can keep writing to a new text/csv file locally at a set schedule (every x minutes) and use this to write that batched text/csv to the table (after that x minutes) and remove the file. This would be really quick.
The other option is write a stored procedure that combines the above & invoke it again in a custom repository.
Please let me know which one you had like elaborated..
UPDATE (Jan 12 2022):
One other item i missed is when you querying for UserLink or UserName you can use a very efficient form of select query that Postgres supports instead of using an IN clause like below,
#Select("select u from user u where u.userLink = ANY('{:userLinks}'::varchar[])", nativeQuery = true)
List<Users> getUsersByLinks(#Param("userLinks") String[] userLinks);
I have an n API using Spring Boot to return the data back from my MySQL db.
I would like to send in a parameter (to keep it simple as part of the URI) to only return an x amount of records back.
My question is
Is it easier to just return all the records back in the Spring Boot app and then only loop through al the records and return the x amount of records back via an Arraylist or
Is there an actual method I can call with either JPA or the standard super class CRUD from Java to get the correct result?
You can use native query in your repository.
For example you have controller named fetch_data_controller and a repository name fetch_data_repository and a table name fetch_data_table from where you have to fetch only specific data.
In fetch_data_repository write the query as follows:
#Query(value = "SELECT col_1,col_2 FROM fetch_data_table WHERE validation = 1", nativeQuery = true)
List<Map<String,String>> fetch_data_func();
In fetch_data_controller write code as follows:
List<Map<String,String>> fetched_data = fetch_data_repository.fetch_data_func();
I am trying to implement pagination using nextPageToken.
I have table:
CREATE TABLE IF NOT EXISTS categories
(
id BIGINT PRIMARY KEY,
name VARCHAR(30) NOT NULL,
parent_id BIGINT REFERENCES categories (id)
);
so I have entity Category:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE)
private Long id;
#Column(name = "name")
private String name;
#OneToOne
#JoinColumn(name = "parent_id", referencedColumnName = "id")
private Category category;
I dont really understand what I have to do next.
client requests token(that keeps what?)
Assume I have controller:
#GetMapping
public ResponseEntity<CategoriesTokenResponse> getCategories(
#RequestParam String nextPageToken
) {
return ResponseEntity.ok(categoryService.getCategories(nextPageToken));
}
Service:
public CategoriesTokenResponse getCategories(String nextPageToken) {
return new CategoriesTokenResponse(categoryDtoList, "myToken");
}
#Data
#AllArgsConstructor
public class CategoriesTokenResponse {
private final List<CategoryDto> categories;
private final String token;
}
How I have to implement sql query for that? And how I have to generate nextPagetoken for each id?
SELECT * FROM categories WHERE parent_id = what?
AND max(category id from previous page = token?)
ORDER BY id LIMIT 20;
First, you need to understand what you are working with here. Every implementation has some sort of limitation or inefficiency. For example, using page tokens like that is only good for infinite scroll pages. You can't jump to any specific page. So if my browser crashes and I'm on page 100, I have to scroll through 100 pages AGAIN. It is faster for massive data sets for sure, but does that matter if you need access to all pages? Or if you limit the return to begin with? Such as only getting the first 30 pages?
Basically decide this: Do you only care about the first few pages because search/sort is always in use? (like a user never using more than the first 1-5 pages of google) and is that data set large? Then great use-case. Will a user select "all items in the last 6 months" and actually need all of them or is the sort/search weak? Or, will you return all pages and not limit max return of 30 pages? Or, is development speed more important than a 0.1-3 second (depends on data size) speed increase? Then go with built in JPA Page objects.
I have used Page objects on 700k records with less than a second speed change compared to 70k records. Based on that, I don't see removing offset adding a ton of value unless you plan for a huge data set. I just tested a new system I'm making with pageable, it returned 10 items on page 1 in 84 milliseconds with no page limiter for 27k records on a vpn into my work network from my house. A table with over 500k records took 131 milliseconds That's pretty fast. Want to make it faster? Force a total max return of 30 pages and a max of 100 results per page, because normally, they don't need all data in that table. They want something else? refine the search. The speed difference is less than a second between this and the seek/key stype paging. This is assuming a normal SQL database too. NoSQL is a bit different here. Baeldung has a ton of articles on jpa paging like the following: https://www.baeldung.com/rest-api-pagination-in-spring
JPA Paging should take no more than 30 minutes to learn and implement, it's extremely easy and comes stock on JPA repositories. I strongly suggest using that over the seek/key style paging as you likely aren't building a system like google's or facebook's.
If you absolutely want to go with the seek/key style paging there's a good informational page here:
https://blog.jooq.org/2013/10/26/faster-sql-paging-with-jooq-using-the-seek-method/
In general, what you are looking for is using JOOQ with spring. Implementation example here:
https://docs.spring.io/spring-boot/docs/1.3.5.RELEASE/reference/html/boot-features-jooq.html
Basically, create a DSL context:
private final DSLContext DSL;
#Autowired
public JooqExample(DSLContext dslContext) {
this.DSL= dslContext;
}
Then use it like so:
DSL.using(configuration)
.select(PLAYERS.PLAYER_ID,
PLAYERS.FIRST_NAME,
PLAYERS.LAST_NAME,
PLAYERS.SCORE)
.from(PLAYERS)
.where(PLAYERS.GAME_ID.eq(42))
.orderBy(PLAYERS.SCORE.desc(),
PLAYERS.PLAYER_ID.asc())
.seek(949, 15) // (!)
.limit(10)
.fetch();
Instead of explicitly phrasing the seek predicate, just pass the last record from the previous query, and jOOQ will see that all records before and including this record are skipped, given the ORDER BY clause.
Currently, I see no existing support for pagination in the graphql-java library. It does have some basic relay support, where-in, we can create a connection, Facebook's recommended way of implementing pagination.
This is the method which helps achieve that. However, with no documentation I'm finding it hard to understand how this function works. Can someone break-down the steps they would take to add pagination support if they already have an existing model which allows basic queries like Add, delete, fetch etc. using the graphql-java library?
You don't even need Relay connections to support pagination. Your query could simply accept a page number and size (or limit/offset) as arguments and return a list - done.
But, if you wanted Relay connection for e.g. Book type, you'd do something like the following:
Relay relay = new Relay();
GraphQLOutputType book = ...; //build your normal Book object type
GraphQLObjectType bookEdge = relay.edgeType(book.getName(), book, null, Collections.emptyList());
GraphQLObjectType bookConnection = relay.connectionType(book.getName(), bookEdge, Collections.emptyList());
As a result, you'd have a BookConnection type that conforms to the Relay connection spec.
As for the example with basic GraphQL, you have a simple web app here.
The connection spec naturally fits a data store that supports cursor based pagination, but needs some creativity when used with different pagination styles.
1) If you wish to use simple offset based paging, you can decide to treat after as the offset (meaning a number would be passed), and first as the limit:
SELECT * FROM ORDER BY timestamp OFFSET $after LIMIT $first
The same for before and last, just different direction.
2) Another way is to treat after/before as the last seen value of the sort column (so an actual (obfuscated) value would be passed):
SELECT * FROM ORDER BY timestamp WHERE timestamp > $after LIMIT $first
I'd also recommend you take a look at my project, graphql-spqr, with an example app, that makes developing GraphQL APIs dead simple.
For example, you'd create a paginated result like this:
public class BookService {
#GraphQLQuery(name = "books")
//make sure the argument names and types match the Relay spec
public Page<Book> getBooks(#GraphQLArgument(name = "first") int first, #GraphQLArgument(name = "after") String after) {
//if you decide to fetch from a SQL DB, you need the limit and offset instead of a cursor
//so, you can treat "first" as count as "after" as offset
int offset = Integer.valueOf(after);
List<Book> books = getBooksFromDB(first, offset);
Page<Book> bookPage = PageFactory.createOffsetBasedPage(books, totalBookCount, offset);
return bookPage;
}
}
There's many other ways to create a Page instance, this is just the most straight-forward one.
You'd then generate a schema from your Java class:
GraphQLSchema schema = new GraphQLSchemaGenerator()
.withOperationsFromSingleton(new BookService())
.generate();
GraphQL graphQL = GraphQLRuntime.newGraphQL(schema).build();
And execute a query:
ExecutionResult result = graphQL.execute("{books(first:10, after:\"20\") {" +
" pageInfo {" +
" hasNextPage" +
" }," +
" edges {" +
" cursor, node {" +
" title" +
"}}}}");
But, again, if you are not using Relay there's really no need to overcomplicate things. If your storage supports cursor-based pagination naturally, go for it. If it doesn't, just use the simple limit/offset arguments and return a list, and forget the connection spec. It was created to enable Relay to automatically manage paging in various scenarios, so it's almost always a total overkill if you're not using Relay and/or a DB with cursor-based pagination.
I have just started learning Couchbase. I am trying to write a basic query using java sdk but I am not able to understand how to write it. Below is the query:
SELECT *
FROM users_with_orders usr
JOIN orders_with_users orders
ON KEYS ARRAY s.order_id FOR s IN usr.shipped_order_history END
This is for joining without array:
LetPath path = select("*,META(usr).id as _ID,META(usr).cas as _CAS).from(bucketName +" usr").join(bucketname +" orders").onKeys("usr.order_id)
How should I proceed with the above query for on keys array?
Thanks!!!!
As described in the docs on Querying from the SDK, you can use either a simple string with the Java SDK or use the DSL. For example:
// query with a simple string
System.out.println("Simple string query:");
N1qlQuery airlineQuery = N1qlQuery.simple("SELECT `travel-sample`.* FROM `travel-sample` WHERE name=\"United Airlines\" AND type=\"airline\"");
N1qlQueryResult queryResult = bucket.query(airlineQuery);
for (N1qlQueryRow result: queryResult) {
System.out.println(result.value());
}
//query with a parameter using the DSL
System.out.println("Parameterized query using the DSL:");
Statement statement = select(path(i("travel-sample"), "*")).from(i("travel-sample")).where(x("name").eq(x("$airline_param")).and(x("type").eq(s("airline"))));
JsonObject placeholderValues = JsonObject.create().put("airline_param", "United Airlines");
N1qlQuery airlineQueryParameterized = N1qlQuery.parameterized(statement, placeholderValues);
N1qlQueryResult queryResultParameterized = bucket.query(airlineQueryParameterized);
for (N1qlQueryRow row : queryResultParameterized) {
System.out.println(row);
}
(I posted a full gist of this example for the imports, etc.)
See the docs for more info, but you may want to use the DSL to allow IDE code completion and Java compile time checking. When developing an interactive web application, you'll probably also want to use parameterized statements (for security) and may even want prepared statements (for performance).