diff --git a/docs/en/jsl/nlu_for_healthcare.md b/docs/en/jsl/nlu_for_healthcare.md index 1bfd8ae64d..d9b29aa0c3 100644 --- a/docs/en/jsl/nlu_for_healthcare.md +++ b/docs/en/jsl/nlu_for_healthcare.md @@ -27,7 +27,7 @@ and the accompanying video below for an introduction to every healthcare domain. **Named entities** are sub-strings in a text that can be classified into catogires of a domain. For example, in the String `"Tesla is a great stock to invest in "` , the sub-string `"Tesla"` is a named entity, it can be classified with the label `company` by an ML algorithm. **Named entities** can easily be extracted by the various pre-trained Deep Learning based NER algorithms provided by NLU. -NER models can be trained for many different domains and aquire expert domain knowledge in each of them. JSL provides a wide array of experts for various Medical, Helathcare and Clinical domains +NER models can be trained for many different domains and aquire expert domain knowledge in each of them. JSL provides a wide array of experts for various Medical, Healthcare and Clinical domains This algorithm is provided by **Spark NLP for Healthcare's** [MedicalNerModel](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators) @@ -70,7 +70,7 @@ Named Entities extracted by an NER model can be further classified into sub-clas All sentences have the entity `headache` which is of class `disease`. But there is a semantic difference on what the actual status of the disease mentioned in text is. In the first and third sentence, `Billy has no headache`, but in the second sentence `Billy actually has a sentence`. The `Entity Assertion` Algorithms provided by JSL solve this problem. The `disease` entity can be classified into `ABSENT` for the first case and into `PRESENT` for the second case. The third case can be classified into `PRESENT IN FAMILY`. -This has immense implications for various data analytical approaches in the helathcare domain. +This has immense implications for various data analytical approaches in the healthcare domain. I.e. imagine you want you want to make a study about hearth attacks and survival rate of potential procedures. You can process all your digital patient notes with an Medical NER model and filter for documents that have the `Hearth Attack` entity. But your collected data will have wrong data entries because of the above mentioned Entity status problem. You cannot deduct that a document is talking about a patient having a hearth attack, unless you **assert** that the problem is actually there which is what the Resolutions algorithms do for you. diff --git a/docs/en/jsl/release_notes.md b/docs/en/jsl/release_notes.md index 9a3a9f4f57..fd7e59575a 100644 --- a/docs/en/jsl/release_notes.md +++ b/docs/en/jsl/release_notes.md @@ -3359,7 +3359,7 @@ for the first time by NLU, including ancient and exotic languages like `Ancient On the healthcare NLP side, a new `ZeroShotRelationExtractionModel` is available, which can extract relations between clinical entities in an unsupervised fashion, no training required! Additionally, New French and Italian Deidentification models are available for clinical and healthcare domains. -Powerd by the fantastic [ Spark NLP for helathcare 3.5.0 release](https://nlp.johnsnowlabs.com/docs/en/spark_nlp_healthcare_versions/licensed_release_notes) +Powerd by the fantastic [ Spark NLP for healthcare 3.5.0 release](https://nlp.johnsnowlabs.com/docs/en/spark_nlp_healthcare_versions/licensed_release_notes)
@@ -4163,7 +4163,7 @@ Integrates the incredible [Spark NLP for Healthcare](https://nlp.johnsnowlabs.co ## NLU Version 3.3.0 -#### 2000%+ Speedup on small data, 63 new models for 100+ Languages with 6 new supported Transformer classes including BERT, XLM-RoBERTa, alBERT, Longformer, XLnet based models, 48 NER profiling helathcare pipelines and much more in John Snow Labs NLU 3.3.0 +#### 2000%+ Speedup on small data, 63 new models for 100+ Languages with 6 new supported Transformer classes including BERT, XLM-RoBERTa, alBERT, Longformer, XLnet based models, 48 NER profiling healthcare pipelines and much more in John Snow Labs NLU 3.3.0 We are incredibly excited to announce NLU 3.3.0 has been released! It comes with a up to 2000%+ speedup on small datasets, 6 new Types of Deep Learning transformer models, including diff --git a/docs/en/licensed_install.md b/docs/en/licensed_install.md index 44676f5d87..73ac368d87 100644 --- a/docs/en/licensed_install.md +++ b/docs/en/licensed_install.md @@ -90,7 +90,7 @@ The first step you need to carry out is installing johnsnowlabs library. This is
-#### 2. Installing Enterprise NLP (Finance, Legal, Helathcare) +#### 2. Installing Enterprise NLP (Finance, Legal, Healthcare) Import `johnsnowlabs` and use our one-liner `nlp.install()` to install all the dependencies, downloading the jars (yes, Spark NLP runs on top of the Java Virtual Machine!), preparing the cluster environment variables, licenses, etc! @@ -473,7 +473,7 @@ Make sure the following prerequisites are set:
-## Non-johnsnowlabs Helathcare NLP on Ubuntu +## Non-johnsnowlabs Healthcare NLP on Ubuntu > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. For installing John Snow Labs NLP libraries on an Ubuntu machine/VM please run the following command: @@ -511,7 +511,7 @@ The install script downloads a couple of example notebooks that you can use to s
-## Non-johnsnowlabs Helathcare NLP via Docker +## Non-johnsnowlabs Healthcare NLP via Docker > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. A docker image that contains all the required libraries for installing and running Enterprise Spark NLP libraries is also available. However, it does not contain the library itself, as it is licensed, and requires installation credentials. @@ -576,10 +576,10 @@ curl -o sparknlp_keys.txt https://raw.githubusercontent.com/JohnSnowLabs/spark-n
-## Non-johnsnowlabs Helathcare NLP on python +## Non-johnsnowlabs Healthcare NLP on python > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. -You can install the Helathcare NLP by using: +You can install the Healthcare NLP by using: ```bash pip install -q spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade @@ -658,7 +658,7 @@ If you want to download the source files (jar and whl files) locally, you can fo # Install Spark NLP from PyPI pip install spark-nlp==${public_version} -#install Spark NLP helathcare +#install Spark NLP Healthcare pip install spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade @@ -674,7 +674,7 @@ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:${public_version} --
-## Non-johnsnowlabs Helathcare NLP for Scala +## Non-johnsnowlabs Healthcare NLP for Scala > These instructions use non-johnsnowlabs installation syntax, since `johnsnowlabs` is a Python library. #### Use Spark NLP in Spark shell @@ -701,7 +701,7 @@ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:${public-version} --j
-## Non-johnsnowlabs Helathcare NLP in Sbt project +## Non-johnsnowlabs Healthcare NLP in Sbt project > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. 1.Download the fat jar for Enterprise Spark NLP. @@ -733,7 +733,7 @@ unmanagedJars in Compile += file("lib/sparknlp-jsl.jar")
-## Non-johnsnowlabs Helathcare NLP on Colab +## Non-johnsnowlabs Healthcare NLP on Colab This is the way to run Clinical NLP in Google Colab if you don't use `johnsnowlabs` library. @@ -792,7 +792,7 @@ os.environ.update(license_keys)
-## Non-johnsnowlabs Helathcare NLP on GCP Dataproc +## Non-johnsnowlabs Healthcare NLP on GCP Dataproc > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. - You can follow the steps here for [installation via IU](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/platforms/dataproc) @@ -882,7 +882,7 @@ Or you can set `.master('yarn')`.
-## Non-johnsnowlabs Helathcare NLP on AWS SageMaker +## Non-johnsnowlabs Healthcare NLP on AWS SageMaker > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. 1. Access AWS Sagemaker in AWS. @@ -923,7 +923,7 @@ spark = sparknlp_jsl.start(license_keys['SECRET'])
-## Non-johnsnowlabs Helathcare NLP with Poetry +## Non-johnsnowlabs Healthcare NLP with Poetry > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. This is a sample `project.toml` file which you can use with `poetry install` to setup spark NLP + the Healthcare python library `spark-nlp-jsl` @@ -954,7 +954,7 @@ build-backend = "poetry.core.masonry.api"
-## Non-johnsnowlabs Helathcare NLP on AWS EMR +## Non-johnsnowlabs Healthcare NLP on AWS EMR > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. In this page we explain how to setup Spark-NLP + Spark-NLP Healthcare in AWS EMR, using the AWS console. @@ -971,21 +971,21 @@ In this page we explain how to setup Spark-NLP + Spark-NLP Healthcare in AWS EMR - select required applications -![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image.png "lit_shadow") +![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image.png "lit_shadow") - Specify EC2 instances for the cluster, as primary/master node and cores/workers - Specify the storage/ EBS volume - ![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-1.png "lit_shadow") + ![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-1.png "lit_shadow") - Choose Cluster scaling and provisioning - Choose Networking / VPC - ![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-2.png "lit_shadow") + ![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-2.png "lit_shadow") - Choose Security Groups/Firewall for primary/master node and cores/workers/slaves -![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-3.png "lit_shadow") +![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-3.png "lit_shadow") - If you have add steps , that will be executed after cluster is provisioned - Specify the S3 location for logs @@ -1064,7 +1064,7 @@ You can change spark configuration according to your needs.
-## Non-johnsnowlabs Helathcare NLP on Amazon Linux 2 +## Non-johnsnowlabs Healthcare NLP on Amazon Linux 2 > These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section. ```bash @@ -1091,7 +1091,7 @@ You can pick the index number (I am using java-8 as default - index 2):
-![Non-johnsnowlabs Helathcare NLP on Amazon Linux 2](/assets/images/installation/amazon-linux.png "lit_shadow") +![Non-johnsnowlabs Healthcare NLP on Amazon Linux 2](/assets/images/installation/amazon-linux.png "lit_shadow")
diff --git a/docs/en/spark_nlp_healthcare_versions/licensed_release_notes.md b/docs/en/spark_nlp_healthcare_versions/licensed_release_notes.md index 386baee398..ec166d3c2b 100644 --- a/docs/en/spark_nlp_healthcare_versions/licensed_release_notes.md +++ b/docs/en/spark_nlp_healthcare_versions/licensed_release_notes.md @@ -97,9 +97,6 @@ text = """he patient is a 42-year-old female and has diabetes mellitus with diab | scope_average | diabetes mellitus | E11.40 | nervous system disorder due to diabetes mellitus [type 2 diabetes mellitus with diabetic neuropathy, unspecified] | - - -
#### De-identifying Sensitive Data in Relational Databases with a Few Lines of Codes diff --git a/docs/en/spark_nlp_healthcare_versions/release_notes_5_5_2.md b/docs/en/spark_nlp_healthcare_versions/release_notes_5_5_2.md index 5b2ba786d5..70141346d8 100644 --- a/docs/en/spark_nlp_healthcare_versions/release_notes_5_5_2.md +++ b/docs/en/spark_nlp_healthcare_versions/release_notes_5_5_2.md @@ -97,9 +97,6 @@ text = """he patient is a 42-year-old female and has diabetes mellitus with diab | scope_average | diabetes mellitus | E11.40 | nervous system disorder due to diabetes mellitus [type 2 diabetes mellitus with diabetic neuropathy, unspecified] | - - -
#### De-identifying Sensitive Data in Relational Databases with a Few Lines of Codes @@ -395,8 +392,6 @@ Muc5AC, human epidermal growth factor receptor-2 (HER2), and Muc6; positive for Please check the [ZeroShot Clinical NER](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/1.6.ZeroShot_Clinical_NER.ipynb) Notebook for more information - -
#### Introducing Clinical Document Analysis with One-Liner Pretrained Pipelines for Specific Clinical Tasks and Concepts @@ -450,8 +445,6 @@ The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469 Please check the [Task Based Clinical Pretrained Pipelines](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.3.Task_Based_Clinical_Pretrained_Pipelines.ipynb) model for more information - -
#### Introducing 2 New Named Entity Recognition and an Assertion Models for Gene and Phenotype Features diff --git a/docs/en/spark_ocr_versions/ocr_release_notes.md b/docs/en/spark_ocr_versions/ocr_release_notes.md index d2aee2bb13..634fe2e7f4 100644 --- a/docs/en/spark_ocr_versions/ocr_release_notes.md +++ b/docs/en/spark_ocr_versions/ocr_release_notes.md @@ -30,16 +30,19 @@ Release date: 23-01-2024 * New Dicom Pretrained Pipelines. * New VisualDocumentProcessor. +
+ ## New Obfuscation Features in ImageDrawRegions ImageDrawRegions' main purpose is to draw solid rectangles on top of regions that typically come from NER or some other similar model. Many times, it is interesting not to only draw solid rectangles on top of detected entities, but some other values, like obfuscated values. For example, with the purpose of protecting patient's privacy, you may want to replace a name with another name, or a date with a modified date. This feature, together with the Deidentification transformer from Spark NLP for Healthcare can be combined to create a 'rendering aware' obfuscation pipeline capable of rendering obfuscated values back to the source location where the original entities were present. The replacement must be 'rendering aware' because not every example of an entity requires the same space on the page to be rendered. So for example, 'Bob Smith' would be a good replacement for 'Rod Adams', but not for 'Alessandro Rocatagliata', simply because they render differently on the page. Let's take a look at a quick example, -![image](/assets/images/ocr/obfuscation_impainting.png) +![New Obfuscation Features in ImageDrawRegions](/assets/images/ocr/obfuscation_impainting.png) to the left we see a portion of a document in which we want to apply obfuscation. We want to focus on the entities representing PHI, like patient name or phone number. On the right side, after applying the transformation, we have an image containing fake values. You can see that the PHI in the source document has been replaced by similar entities, and these entities not only are of a similar category, but are also of a similar length. +
## New obfuscation features in DicomMetadataDeidentifier Now you can customize the way metadata is de-identified in DicomMetadataDeidentifier. Customization happens through a number of different actions you can apply to each tag, for example, replacing a specific tag with a literal, or shifting a date by a number of days randomly. @@ -70,6 +73,7 @@ ShiftTimeByRandomNbOfSecs | DT | coherent replaceWithRandomName | PN, LO | coherent shiftDateByFixedNbOfDays | DA | 112 +
### New Dicom Pretrained Pipelines We are releasing three new Dicom Pretrained Pipelines: @@ -79,6 +83,8 @@ We are releasing three new Dicom Pretrained Pipelines: Check notebook [here](https://github.com/JohnSnowLabs/visual-nlp-workshop/blob/master/jupyter/Dicom/SparkOcrDicomPretrainedPipelines.ipynb) for examples on how to use this. +
+ ### New Visual Document Processor New VisualDocumentProcessor that produces OCR text and tables on a single pass!, In plugs and play into any Visual NLP pipeline, it receives images, and it returns texts and tables following the same existing schemas for these datatypes, @@ -93,6 +99,8 @@ result = proc.transform(df) Check this [sample notebook](https://github.com/JohnSnowLabs/visual-nlp-workshop/blob/master/jupyter/SparkOcrVisualDocumentProcessor.ipynb) for an example on how to use it. +
+ ### Other Dicom Changes * DicomDrawRegions support for setting compression quality, now you can pick different compression qualities for each of the different compression algorithms supported. The API receives an array with each element specifying the compression type like a key/value, Example, @@ -101,6 +109,8 @@ DicomDrawRegions()\ .setCompressionQuality(["8Bit=90","LSNearLossless=2"]) ``` +
+ ### Enhancements & Bug Fixes * New parameter in SVS tool that specifies whether to rename output file or not, ``` diff --git a/docs/en/spark_ocr_versions/release_notes_5_5_0.md b/docs/en/spark_ocr_versions/release_notes_5_5_0.md index cdd4733892..f210688cb1 100644 --- a/docs/en/spark_ocr_versions/release_notes_5_5_0.md +++ b/docs/en/spark_ocr_versions/release_notes_5_5_0.md @@ -30,16 +30,19 @@ Release date: 23-01-2024 * New Dicom Pretrained Pipelines. * New VisualDocumentProcessor. +
+ ## New Obfuscation Features in ImageDrawRegions ImageDrawRegions' main purpose is to draw solid rectangles on top of regions that typically come from NER or some other similar model. Many times, it is interesting not to only draw solid rectangles on top of detected entities, but some other values, like obfuscated values. For example, with the purpose of protecting patient's privacy, you may want to replace a name with another name, or a date with a modified date. This feature, together with the Deidentification transformer from Spark NLP for Healthcare can be combined to create a 'rendering aware' obfuscation pipeline capable of rendering obfuscated values back to the source location where the original entities were present. The replacement must be 'rendering aware' because not every example of an entity requires the same space on the page to be rendered. So for example, 'Bob Smith' would be a good replacement for 'Rod Adams', but not for 'Alessandro Rocatagliata', simply because they render differently on the page. Let's take a look at a quick example, -![image](/assets/images/ocr/obfuscation_impainting.png) +![New Obfuscation Features in ImageDrawRegions](/assets/images/ocr/obfuscation_impainting.png) to the left we see a portion of a document in which we want to apply obfuscation. We want to focus on the entities representing PHI, like patient name or phone number. On the right side, after applying the transformation, we have an image containing fake values. You can see that the PHI in the source document has been replaced by similar entities, and these entities not only are of a similar category, but are also of a similar length. +
## New obfuscation features in DicomMetadataDeidentifier Now you can customize the way metadata is de-identified in DicomMetadataDeidentifier. Customization happens through a number of different actions you can apply to each tag, for example, replacing a specific tag with a literal, or shifting a date by a number of days randomly. @@ -70,6 +73,7 @@ ShiftTimeByRandomNbOfSecs | DT | coherent replaceWithRandomName | PN, LO | coherent shiftDateByFixedNbOfDays | DA | 112 +
### New Dicom Pretrained Pipelines We are releasing three new Dicom Pretrained Pipelines: @@ -79,6 +83,8 @@ We are releasing three new Dicom Pretrained Pipelines: Check notebook [here](https://github.com/JohnSnowLabs/visual-nlp-workshop/blob/master/jupyter/Dicom/SparkOcrDicomPretrainedPipelines.ipynb) for examples on how to use this. +
+ ### New Visual Document Processor New VisualDocumentProcessor that produces OCR text and tables on a single pass!, In plugs and play into any Visual NLP pipeline, it receives images, and it returns texts and tables following the same existing schemas for these datatypes, @@ -93,6 +99,8 @@ result = proc.transform(df) Check this [sample notebook](https://github.com/JohnSnowLabs/visual-nlp-workshop/blob/master/jupyter/SparkOcrVisualDocumentProcessor.ipynb) for an example on how to use it. +
+ ### Other Dicom Changes * DicomDrawRegions support for setting compression quality, now you can pick different compression qualities for each of the different compression algorithms supported. The API receives an array with each element specifying the compression type like a key/value, Example, @@ -101,6 +109,8 @@ DicomDrawRegions()\ .setCompressionQuality(["8Bit=90","LSNearLossless=2"]) ``` +
+ ### Enhancements & Bug Fixes * New parameter in SVS tool that specifies whether to rename output file or not, ``` @@ -120,4 +130,4 @@ This release is compatible with Spark-NLP 5.5.2, and Spark NLP for Healthcare 5.
-{%- include docs-sparckocr-pagination.html -%} +{%- include docs-sparckocr-pagination.html -%} \ No newline at end of file