I'm working on a small side-project for our company that does the following:
PDF-based documents received through Office 365 Outlook are temporarily stored in OneDrive, using Power Automate
Text data is extracted from the PDFs using a few Java libraries
Based on extracted data an appropriate filename and filepath is created
The PDFs are permanently saved in OneDrive
The issue right now is that my Java program is locally-run, i.e. point 2,3,4 require code to run 24/7 on my PC. I'd like to transition to a Cloud-based solution.
What is the easiest way to accomplish this? The solution doesn't have to be free, but shouldn't cost more than $20/mo. Our company already has an Azure subscription, though I'm not familiar yet with Azure.
What you are looking for is a solution that uses a serverless computing execution model. Azure Functions seems to be a possible choice here. It does seem to have input bindings that respond to OneDrive files and an likewise output bindings.
The cost will depend on the number of documents, not the time the solution is available. I assume we are talking about a small number of documents a month so this will come out cheaper than other execution models.
Related
I'm planing on using Amazon S3 to store milions of relatively small files (~100kB-2mB). To save on upload time I structured them into directories (tens/hundreds of files per directory), and decided to use TransferManager's uploadDirectory/uploadFileList. However after uploading an individual file I need to perform specific operations on my HDD and DB. Is there any way (preferably implementing observers/listeners) to notify me whenever a specific file has finished uploading or am I cursed with only being able to verify if the entire MultipleFileUpload succeeded?
For whatever it's worth I'm using the Java SDK, however I should be able to adapt a .NET/REST solution to my needs.
Realizing that this isn't exactly what you asked, it's pretty sweet and seems like an appropriate solution...
S3 does have notifications you can configure to alert you when an object has been created or deleted (or if a reduced redundancy object was lost). These can go to SQS, SNS, or Lambda (which could potentially even run the code that updates the database), and of course if you send them to SNS you can then fan them out to multiple destinations.
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#notification-how-to-event-types-and-destinations
Don't make the mistake, however, of selecting only the upload subtype you assume is being used; use the s3:ObjectCreated:* event unless you have a specific reason not to.
I'm trying to work on an app which uses GTFS. This may seems like a stupid question but I couldn't find any answer to it.
The GTFS for Israel, a rather small country with not so many buses infrastructure, is around 120 MB zipped file.
Right now the only possible way I could think of for getting it working is to download the file, but downloading 120 MB using the phone could take quite a long time. Sure you can do this only once and save it in a database on the phone, but it still requires downloading 120 MB.
Since it is zipped, I can't unzip it over the server and than just get the txt files..
So basically I'm asking, How can I get the information to the phone, without downloading the zipped file?
I've seen and used apps which uses that same GTFS file, and they load up really fast, even on the first load..
I hope you understand my issue, not sure how to explain it better.
Thanks!
P.s I would make an iPhone app too, and it's the same issue, hence the iPhone tag
One approach might be to preprocess the GTFS data during your app development. You could load it into a SQLite database, and use Core Data to get the data you need out of the file at runtime. This also gives you an opportunity to include only the data that you actually need for your app - it doesn't make sense to ask users to download extra data that they won't need.
Use protocol binary format (pbf) formely google and now open source. It is compact and very fast searchable, so no need to decompress it on a device and load it into a database on that device because pbf acts as a database. Just include pbf library in your code to query it. Of course you have to compress it once before distributing the data online.
I have a problem I've been dealing with lately. My application asks its users to upload videos, to be shared with a private community. They are teaching videos, which are not always optimized for web quality to start with. The problem is, many of the videos are huge, way over the 50 megs I've seen in another question. In one case, a video was over a gig, and the only solution I had was to take the client's video from box.net, upload it to the video server via FTP, then associate it with the client's account by updating the database manually. Obviously, we don't want to deal with videos this way, we need it to all be handled automatically.
I've considered using either the box.net or dropbox API to facilitate large uploads, but would rather not go that way if I don't have to. We're using PHP for the main logic of the site, though I'm comfortable with many other languages, especially Python, but including Java, C++, or Perl. If I have to dedicate a whole server or server instance to handling the uploads, I will.
I'd rather do the client-side using native browser JavaScript, instead of Flash or other proprietary tech.
What is the final answer to uploading huge files though the web, by handling the server response in PHP or any other language?
It is possible to raise the limits in Apache and PHP to handle files of this size. The basic HTTP upload mechanism does not offer progressive information, however, so I would usually consider this acceptable only for LAN-type connections.
The normal alternative is to locate a Flash or Javascript uploader widget. These have the bonus that they can display progressive information and will integrate well with a PHP-based website.
For php http://php.net/manual/en/features.file-upload.php
Note the ini files changes in the first comment.
Edit: Assuming you are running into timeout issues.
My work has tasked me with determining the feasibility of migrating our existing in-house built change management services(web based) to a Sharepoint solution. I've found everything to be easy except I've run into the issue that for each change management issue (several thousand) there may be any number of attachment files associated with them, called through javascript, that need to be downloaded and put into a document library.
(ex. ... onClick="DownloadAttachment(XXXXX,'ProjectID=YYYY');return false">Attachment... ).
To keep me from manually selecting them all I've been looking over posts of people wanting to do similar, and there seem to be many possible solutions, but they often seem more complicated than they need to be.
So I suppose in a nutshell I'm asking what would be the best way to approach this issue that yields some sort of desktop application or script that can interact with web pages and will let me select and organize all the attachments. (Making a purely web based app (php, javascript, rails, etc.) is not an option for me, so throwing that out there now).
Thanks in advance.
Given a document id and project id,
XXXXX and YYYY respectively in
your example, figure out the URL
from which the file contents can be
downloaded. You can observe a few
URL links in the browser and detect
the pattern which your web
application uses.
Use a tool like Selenium to get a
list of XXXXXs and YYYYs of
documents you need to download.
Write a bash script with wget to
download the files locally and put
in the correct folders.
This is a "one off" migration, right?
Get access to your in-house application's database, and create an SQL query which pulls out rows showing the attachment names (XXXXX?) and the issue/project (YYYY?), ex:
|file_id|issue_id|file_name |
| 5| 123|Feasibility Test.xls|
Analyze the DownloadAttachment method and figure out how it generates the URL that it calls for each download.
Start a script (personally I'd go for Python) that will do the migration work.
Program the script to connect and run the SQL query, or can read a CSV file you create manually from step #1.
Program the script to use the details to determine the target-filename and the URL to download from.
Program the script to download the file from the given URL, and place it on the hard drive with the proper name. (In Python, you might use urllib.)
Hopefully that will get you as far as a bunch of files categorized by "issue" like:
issue123/Feasibility Test.xls
issue123/Billing Invoice.doc
issue456/Feasibility Test.xls
Thank you everyone. I was able to get what I needed using htmlunit and java to traverse a report I made of all change items with attachments, go to each one, copy the source code, traverse that to find instances of the download method, and copy the unique IDs of each attachment and build an .xls of all items and their attachments.
This has been discussed before here. Using Java, I have developed my web services on Tomcat for a media library. I want to add a functionality for streaming media while dynamically transcoding them as appropriate to mobile clients. There are few questions I am pondering over :
How exactly to stream the files (both audio and video) ? I am coming across many streaming servers - but I want something to be done on my code from Tomcat itself. Do I need to install one more server, i.e , the streaming server - and then redirect streaming requests to that server from Tomcat ?
Is it really a good idea to dynamically transcode ? Static transcoding means we have to replicate the same file in 'N' formats - something which is space consuming and I dont want. So is there a way out ?
Is it possible to stream the data "as it is transcoded"...that is, I dont want to start streaming when the transcoding has finished (as it introduces latency) - rather I want to stream the transcoded data bytes as they are produced. I apologize if this is an absurd requirement...I have no experience of either transcoding or streaming.
Other alternatives like ffmpeg, Xuggler and other technologies mentioned here - are they a better approach for getting the job done ?
I dont want to use any proprietary / cost based alternative to achieve this goal, and I also want this to work in production environments. Hope to get some help here...
Thanks a lot !
Red5 is another possible solution. Its open source and is essentially Tomcat with some added features. I don't know how far back in time the split from the Tomcat codebase occurred but the basics are all there (and the source - so you can patch what's missing).
Xuggler is a lib 'front end' for ffmpeg and plays nicely with Red5. If you intend to do lots of transcoding you'll probably run into this code along the way.
Between these two projects you can change A/V format and stream various media.
Unless you really need to roll your own I'd reccomend an OSS project with good community support.
For your questions:
1.) This is the standard space vs. performace tradeoff. You see the same thing in generating hash tables and other computationally expensive operations. If space is a larger issue than processor time, then dynamic transcoding is your only way out.
2.) Yes, you can stream during the transcode process. VLC http://www.videolan.org/vlc/ does this.
3.) I'd really look into VLC if I were you.