How can I translate this deserialization code from java to scala? - java

I'm a Scala/Java noob, so sorry if this is a relatively easy solution--but I'm trying to access a model in an external file (an Apache Open NLP model), and not sure where I'm going wrong. Here's how you'd do it in Java, and here's what I'm trying:
import java.io._
val nlpModelPath = new java.io.File( "." ).getCanonicalPath + "/lib/models/en-sent.bin"
val modelIn: InputStream = new FileInputStream(nlpModelPath)
which works fine, but trying to instantiate an object based off the model in that binary file is where I'm failing:
val sentenceModel = new modelIn.SentenceModel // type SentenceModel is not a member of java.io.InputStream
val sentenceModel = new modelIn("SentenceModel") // not found: type modelIn
I've also tried a DataInputStream:
val file = new File(nlpModelPath)
val dis = new DataInputStream(file)
val sentenceModel = dis.SentenceModel() // value SentenceModel is not a member of java.io.DataInputStream
I'm not sure what I'm missing--maybe some method to convert the Stream to some binary object from which I can pull in methods? Thank you for any pointers.

The problem is that you're using wrong syntax (please, don't take it personal, but why don't you read some beginner java book or even just a tutorial first if you planning to stick with java or scala for some time?)
Code you would write in java
SentenceModel model = new SentenceModel(modelIn);
will look similar in scala:
val model: SentenceModel = new SentenceModel(modelIn)
// or just
val model = new SentenceModel(modelIn)
The problem you got with this syntax is that you forgot to import definition of SentenceModel so compiler simply has no clue what is SentenceModel.
Add
import opennlp.tools.sentdetect.SentenceModel
At the top of your .scala file and this will fix it.

Related

How to convert Scala FilePart as File (Java) to use in multipart-form data?

I have a method that already have an FilePart[TemporaryFile] and i will call another method to send a multi-part form data. This method is using scala play 2.4.X and i have to send it using ning method below:
def sendFile(file: FilePart[TemporaryFile]): Option[Future[Unit]] = {
val asyncHttpClient:AsyncHttpClient = WS.client.underlying
val postBuilder = asyncHttpClient.preparePost(s"${config.ocrProvider.host}")
val multiPartPost = postBuilder
.addBodyPart(new StringPart("access_token",s"${config.ocrProvider.accessToken}"))
.addBodyPart(new StringPart("typename",s"${config.ocrProvider.typeName}"))
.addBodyPart(new StringPart("action",s"${config.ocrProvider.actionUpload}"))
.addBodyPart(new FilePart(**expects java.io.File not FilePart**)
}
How can i take advantage of this parameter and send as java.io.File?
You need to write the content of file: FilePart[TemporaryFile] to disk and then use that file for constructing the new multipart request. You can see this example Scala File Upload
val tempFile = new File("/tmp/some/path")
file.ref.moveTo(tempFile)
val filePart = new FilePart(tempFile)

Groovy deep copy json map

I am trying to create a deep copy of a JSON map in groovy for a build config script.
I have tried the selected answer
def deepcopy(orig) {
bos = new ByteArrayOutputStream()
oos = new ObjectOutputStream(bos)
oos.writeObject(orig); oos.flush()
bin = new ByteArrayInputStream(bos.toByteArray())
ois = new ObjectInputStream(bin)
return ois.readObject()
}
from this existing question but it fails for JSON maps with java.io.NotSerializableException: groovy.json.internal.LazyMap
how can I create a deep copy of the JSON map?
Once you read the JSON, you have the copy.
import groovy.json.JsonSlurper
import groovy.json.JsonOutput
def json = new JsonSlurper().parseText('''{"l1": {"l2": {"l3": 42}}}''')
json.l1.l2.l3 = 23
assert '''{"l2":{"l3":23}}''' == JsonOutput.toJson(json.l1)

Scala PDFBox error in code

Wrote a function for reading text from a PDF document.
Used scala language, Selenium, PDFBox 2.0.1.
Below is the code:
enter code here
import org.openqa.selenium.firefox.{FirefoxBinary, FirefoxDriver, FirefoxProfile}
import org.apache.pdfbox.pdfparser.PDFParser
import org.apache.pdfbox.text.PDFTextStripper
import java.io.BufferedInputStream
def pdfreaddata {
driver.get("https://www.....pdf")
driver.manage.timeouts.implicitlyWait(50, TimeUnit.SECONDS)
val url: URL = new URL(driver.getCurrentUrl)
println(url)
val fileToParse: BufferedInputStream = new BufferedInputStream(url.openStream())
val parser: PDFParser = new PDFParser(fileToParse)
parser.parse()
val output: String = new PDFTextStripper().getText(parser.getPDDocument)
println("pdf Value" + output)
parser.getPDDocument.close()
driver.manage.timeouts.implicitlyWait(100, TimeUnit.SECONDS)
}
Showing error for PDFParser in val parser: PDFParser = new PDFParser(fileToParse)
Error message:
Cannot resolve constructor
Tried the code in Java too, getting same error.
You are using PDFBox version 2.x, however you are obviously following the docs for version 1.x . In 2.0 there is no such constructor. Some things have changed, including parsing. Follow the migration guide or fall back to 1.8, since it does look much more documented and with more material online.
Using pdfbox 1.8.12 solved the constructor issue. But even the pdf's was not password protected, it was showing as encrypted. Below is the final code using Scala to extract encrypted text from a pdf document. Might be useful for someone in future.
def pdfreaddata {
driver.get("https://www....combo.pdf")
driver.manage.timeouts.implicitlyWait(50, TimeUnit.SECONDS)
val url: URL = new URL(driver.getCurrentUrl)
println(url)
val fileToParse: BufferedInputStream = new BufferedInputStream(url.openStream())
val parser: PDFParser = new PDFParser(fileToParse)
parser.parse()
val cosDocument:COSDocument = parser.getDocument()
val pdDocument:PDDocument = new PDDocument(cosDocument)
if(pdDocument.isEncrypted()) {
val sdm: StandardDecryptionMaterial = new StandardDecryptionMaterial(PDF_OWNER_PASSWORD)//PDF_OWNER_PASSWORD =""
pdDocument.openProtection(sdm)
}
val output: String = new PDFTextStripper().getText(pdDocument)
println("pdf Value" + output)
parser.getPDDocument.close()
driver.manage.timeouts.implicitlyWait(100, TimeUnit.SECONDS)
}
}

Java InputStreamReader Error (org.apache.poi.openxml4j.exceptions.InvalidOperationException)

I am trying to convert pptx files to txt (Text Extraction) using Apache POI Framework (Java).
I'm new in coding Java, so I don't know a lot about Buffered Readers/InputStream, etc.
What I tried is:
import org.apache.poi.xslf.XSLFSlideShow;
import org.apache.poi.xslf.extractor.XSLFPowerPointExtractor;
import org.apache.poi.xslf.usermodel.XMLSlideShow;
... Classes and Stuff ....
String inputfile = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
InputStream fis = new FileInputStream(inputfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fis));
String fileName = br1.readLine();
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(fileName))).getText());
br1.close();
My goal is, to write the extracted text into a variable, but It doesn't even work to print it on console... What I get is:
org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: 'PK
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:102)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
org.apache.poi.POIXMLDocument.openPackage(POIXMLDocument.java:69)
org.apache.poi.xslf.XSLFSlideShow.<init>(XSLFSlideShow.java:90)
Any help would be greatly appreciated!
You are doing much to much, in fact you are trying to read the data of the PPTX itself as filename, better simply use
System.out.println(new XSLFPowerPointExtractor(
new XMLSlideShow(new XSLFSlideShow(
"X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"))).getText());
or more generic
POITextExtractor extractor = ExtractorFactory.createExtractor(
new java.io.File("X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx"");
System.out.println(extractor.getText());
extractor.close();
I cannot give you the correct answer (because I myself don't use POI), but I can tell you where your mistake might lie.
The constructor of the class XSLFSlideShow is expecting file path as its argument. But you are passing an InputStream. Try it as follows:
String filePath = "X:\\Master\\simpl_temp\\2d0a44a2-95e7-428c-911c-1f803acbff42.pptx";
System.out.println(new XSLFPowerPointExtractor(new XMLSlideShow(new XSLFSlideShow(filePath))).getText());

How to convert HttpPostedFileBase file to Java.Io.InputStream?

I'm working on ASP.net with the MPXJ library. The .net version of MPXJ has been created using IKVM.
Currently, I have a big problem: After upload a file (Microsoft Project file - .mpp file) to server (I don't need to save it), I want to convert from HttpPostedFileBase to the IKVM version of java.io.InputStream and MPXJ will manipulate them, but I don't know a way to implement this.
My code:
public ActionResult Upload(HttpPostedFileBase files)
{
// Todo: Convert from HttpPostedFileBase to Java.Io.InputStream
ProjectReader reader = new MPPReader();
ProjectFile projectObj = reader.read(Java.Io.InputStream);
}
You need a wrapper to provide a conversion between the IKVM Java type java.io.InputStream and a .net Stream instance. As luck would have it, IKVM ships with one...
Using the wrapper, your example will now look like this:
public ActionResult Upload(HttpPostedFileBase files)
{
ProjectReader reader = new MPPReader();
ProjectFile projectObj = reader.read(new ikvm.io.InputStreamWrapper(files.InputStream));
}
If you don't want to use IKVM, you can implement as below:
public ActionResult Upload(HttpPostedFileBase files)
{
byte[] fileData = null;
using (var binaryReader = new BinaryReader(files.InputStream))
{
fileData = binaryReader.ReadBytes(files.ContentLength);
}
ProjectFile projectObj = reader.read(new ByteArrayInputStream(fileData));
}

Categories