Skip to content

Commit

Permalink
helathcare (#1706)
Browse files Browse the repository at this point in the history
  • Loading branch information
agsfer authored Jan 30, 2025
1 parent 4da65ce commit 3e28540
Show file tree
Hide file tree
Showing 7 changed files with 46 additions and 36 deletions.
4 changes: 2 additions & 2 deletions docs/en/jsl/nlu_for_healthcare.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ and the accompanying video below for an introduction to every healthcare domain.
**Named entities** are sub-strings in a text that can be classified into catogires of a domain. For example, in the String
`"Tesla is a great stock to invest in "` , the sub-string `"Tesla"` is a named entity, it can be classified with the label `company` by an ML algorithm.
**Named entities** can easily be extracted by the various pre-trained Deep Learning based NER algorithms provided by NLU.
NER models can be trained for many different domains and aquire expert domain knowledge in each of them. JSL provides a wide array of experts for various Medical, Helathcare and Clinical domains
NER models can be trained for many different domains and aquire expert domain knowledge in each of them. JSL provides a wide array of experts for various Medical, Healthcare and Clinical domains

This algorithm is provided by **Spark NLP for Healthcare's** [MedicalNerModel](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators)

Expand Down Expand Up @@ -70,7 +70,7 @@ Named Entities extracted by an NER model can be further classified into sub-clas
All sentences have the entity `headache` which is of class `disease`.
But there is a semantic difference on what the actual status of the disease mentioned in text is. In the first and third sentence, `Billy has no headache`, but in the second sentence `Billy actually has a sentence`.
The `Entity Assertion` Algorithms provided by JSL solve this problem. The `disease` entity can be classified into `ABSENT` for the first case and into `PRESENT` for the second case. The third case can be classified into `PRESENT IN FAMILY`.
This has immense implications for various data analytical approaches in the helathcare domain.
This has immense implications for various data analytical approaches in the healthcare domain.

I.e. imagine you want you want to make a study about hearth attacks and survival rate of potential procedures. You can process all your digital patient notes with an Medical NER model and filter for documents that have the `Hearth Attack` entity.
But your collected data will have wrong data entries because of the above mentioned Entity status problem. You cannot deduct that a document is talking about a patient having a hearth attack, unless you **assert** that the problem is actually there which is what the Resolutions algorithms do for you.
Expand Down
4 changes: 2 additions & 2 deletions docs/en/jsl/release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3359,7 +3359,7 @@ for the first time by NLU, including ancient and exotic languages like `Ancient
On the healthcare NLP side, a new `ZeroShotRelationExtractionModel` is available, which can extract relations between
clinical entities in an unsupervised fashion, no training required!
Additionally, New French and Italian Deidentification models are available for clinical and healthcare domains.
Powerd by the fantastic [ Spark NLP for helathcare 3.5.0 release](https://nlp.johnsnowlabs.com/docs/en/spark_nlp_healthcare_versions/licensed_release_notes)
Powerd by the fantastic [ Spark NLP for healthcare 3.5.0 release](https://nlp.johnsnowlabs.com/docs/en/spark_nlp_healthcare_versions/licensed_release_notes)

</div><div class="h3-box" markdown="1">

Expand Down Expand Up @@ -4163,7 +4163,7 @@ Integrates the incredible [Spark NLP for Healthcare](https://nlp.johnsnowlabs.co

## NLU Version 3.3.0

#### 2000%+ Speedup on small data, 63 new models for 100+ Languages with 6 new supported Transformer classes including BERT, XLM-RoBERTa, alBERT, Longformer, XLnet based models, 48 NER profiling helathcare pipelines and much more in John Snow Labs NLU 3.3.0
#### 2000%+ Speedup on small data, 63 new models for 100+ Languages with 6 new supported Transformer classes including BERT, XLM-RoBERTa, alBERT, Longformer, XLnet based models, 48 NER profiling healthcare pipelines and much more in John Snow Labs NLU 3.3.0

We are incredibly excited to announce NLU 3.3.0 has been released!
It comes with a up to 2000%+ speedup on small datasets, 6 new Types of Deep Learning transformer models, including
Expand Down
38 changes: 19 additions & 19 deletions docs/en/licensed_install.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ The first step you need to carry out is installing johnsnowlabs library. This is

</div><div class="h3-box" markdown="1">

#### 2. Installing Enterprise NLP (Finance, Legal, Helathcare)
#### 2. Installing Enterprise NLP (Finance, Legal, Healthcare)

Import `johnsnowlabs` and use our one-liner `nlp.install()` to install all the dependencies, downloading the jars (yes, Spark NLP runs on top of the Java Virtual Machine!), preparing the cluster environment variables, licenses, etc!

Expand Down Expand Up @@ -473,7 +473,7 @@ Make sure the following prerequisites are set:

</div><div class="h3-box" markdown="1">

## Non-johnsnowlabs Helathcare NLP on Ubuntu
## Non-johnsnowlabs Healthcare NLP on Ubuntu
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.

For installing John Snow Labs NLP libraries on an Ubuntu machine/VM please run the following command:
Expand Down Expand Up @@ -511,7 +511,7 @@ The install script downloads a couple of example notebooks that you can use to s

</div><div class="h3-box" markdown="1">

## Non-johnsnowlabs Helathcare NLP via Docker
## Non-johnsnowlabs Healthcare NLP via Docker
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.

A docker image that contains all the required libraries for installing and running Enterprise Spark NLP libraries is also available. However, it does not contain the library itself, as it is licensed, and requires installation credentials.
Expand Down Expand Up @@ -576,10 +576,10 @@ curl -o sparknlp_keys.txt https://raw.githubusercontent.com/JohnSnowLabs/spark-n
</div><div class="h3-box" markdown="1">
## Non-johnsnowlabs Helathcare NLP on python
## Non-johnsnowlabs Healthcare NLP on python
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
You can install the Helathcare NLP by using:
You can install the Healthcare NLP by using:
```bash
pip install -q spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade
Expand Down Expand Up @@ -658,7 +658,7 @@ If you want to download the source files (jar and whl files) locally, you can fo
# Install Spark NLP from PyPI
pip install spark-nlp==${public_version}
#install Spark NLP helathcare
#install Spark NLP Healthcare
pip install spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade
Expand All @@ -674,7 +674,7 @@ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:${public_version} --
</div><div class="h3-box" markdown="1">
## Non-johnsnowlabs Helathcare NLP for Scala
## Non-johnsnowlabs Healthcare NLP for Scala
> These instructions use non-johnsnowlabs installation syntax, since `johnsnowlabs` is a Python library.
#### Use Spark NLP in Spark shell
Expand All @@ -701,7 +701,7 @@ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:${public-version} --j
</div><div class="h3-box" markdown="1">
## Non-johnsnowlabs Helathcare NLP in Sbt project
## Non-johnsnowlabs Healthcare NLP in Sbt project
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
1.Download the fat jar for Enterprise Spark NLP.
Expand Down Expand Up @@ -733,7 +733,7 @@ unmanagedJars in Compile += file("lib/sparknlp-jsl.jar")
</div><div class="h3-box" markdown="1">
## Non-johnsnowlabs Helathcare NLP on Colab
## Non-johnsnowlabs Healthcare NLP on Colab
This is the way to run Clinical NLP in Google Colab if you don't use `johnsnowlabs` library.

Expand Down Expand Up @@ -792,7 +792,7 @@ os.environ.update(license_keys)

</div><div class="h3-box" markdown="1">

## Non-johnsnowlabs Helathcare NLP on GCP Dataproc
## Non-johnsnowlabs Healthcare NLP on GCP Dataproc
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.

- You can follow the steps here for [installation via IU](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/platforms/dataproc)
Expand Down Expand Up @@ -882,7 +882,7 @@ Or you can set `.master('yarn')`.
</div><div class="h3-box" markdown="1">
## Non-johnsnowlabs Helathcare NLP on AWS SageMaker
## Non-johnsnowlabs Healthcare NLP on AWS SageMaker
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
1. Access AWS Sagemaker in AWS.
Expand Down Expand Up @@ -923,7 +923,7 @@ spark = sparknlp_jsl.start(license_keys['SECRET'])
</div><div class="h3-box" markdown="1">
## Non-johnsnowlabs Helathcare NLP with Poetry
## Non-johnsnowlabs Healthcare NLP with Poetry
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
This is a sample `project.toml` file which you can use with `poetry install` to setup spark NLP + the Healthcare python library `spark-nlp-jsl`
Expand Down Expand Up @@ -954,7 +954,7 @@ build-backend = "poetry.core.masonry.api"
</div><div class="h3-box" markdown="1">
## Non-johnsnowlabs Helathcare NLP on AWS EMR
## Non-johnsnowlabs Healthcare NLP on AWS EMR
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
In this page we explain how to setup Spark-NLP + Spark-NLP Healthcare in AWS EMR, using the AWS console.
Expand All @@ -971,21 +971,21 @@ In this page we explain how to setup Spark-NLP + Spark-NLP Healthcare in AWS EMR
- select required applications
![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image.png "lit_shadow")
![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image.png "lit_shadow")
- Specify EC2 instances for the cluster, as primary/master node and cores/workers
- Specify the storage/ EBS volume
![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-1.png "lit_shadow")
![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-1.png "lit_shadow")
- Choose Cluster scaling and provisioning
- Choose Networking / VPC
![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-2.png "lit_shadow")
![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-2.png "lit_shadow")
- Choose Security Groups/Firewall for primary/master node and cores/workers/slaves
![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-3.png "lit_shadow")
![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-3.png "lit_shadow")
- If you have add steps , that will be executed after cluster is provisioned
- Specify the S3 location for logs
Expand Down Expand Up @@ -1064,7 +1064,7 @@ You can change spark configuration according to your needs.

</div><div class="h3-box" markdown="1">

## Non-johnsnowlabs Helathcare NLP on Amazon Linux 2
## Non-johnsnowlabs Healthcare NLP on Amazon Linux 2
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.

```bash
Expand All @@ -1091,7 +1091,7 @@ You can pick the index number (I am using java-8 as default - index 2):

</div><div class="h3-box" markdown="1">

![Non-johnsnowlabs Helathcare NLP on Amazon Linux 2](/assets/images/installation/amazon-linux.png "lit_shadow")
![Non-johnsnowlabs Healthcare NLP on Amazon Linux 2](/assets/images/installation/amazon-linux.png "lit_shadow")

</div><div class="h3-box" markdown="1">

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,6 @@ text = """he patient is a 42-year-old female and has diabetes mellitus with diab
| scope_average | diabetes mellitus | E11.40 | nervous system disorder due to diabetes mellitus [type 2 diabetes mellitus with diabetic neuropathy, unspecified] |





</div><div class="h3-box" markdown="1">

#### De-identifying Sensitive Data in Relational Databases with a Few Lines of Codes
Expand Down
7 changes: 0 additions & 7 deletions docs/en/spark_nlp_healthcare_versions/release_notes_5_5_2.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,6 @@ text = """he patient is a 42-year-old female and has diabetes mellitus with diab
| scope_average | diabetes mellitus | E11.40 | nervous system disorder due to diabetes mellitus [type 2 diabetes mellitus with diabetic neuropathy, unspecified] |





</div><div class="h3-box" markdown="1">

#### De-identifying Sensitive Data in Relational Databases with a Few Lines of Codes
Expand Down Expand Up @@ -395,8 +392,6 @@ Muc5AC, human epidermal growth factor receptor-2 (HER2), and Muc6; positive for

Please check the [ZeroShot Clinical NER](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/1.6.ZeroShot_Clinical_NER.ipynb) Notebook for more information



</div><div class="h3-box" markdown="1">

#### Introducing Clinical Document Analysis with One-Liner Pretrained Pipelines for Specific Clinical Tasks and Concepts
Expand Down Expand Up @@ -450,8 +445,6 @@ The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469

Please check the [Task Based Clinical Pretrained Pipelines](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.3.Task_Based_Clinical_Pretrained_Pipelines.ipynb) model for more information



</div><div class="h3-box" markdown="1">

#### Introducing 2 New Named Entity Recognition and an Assertion Models for Gene and Phenotype Features
Expand Down
12 changes: 11 additions & 1 deletion docs/en/spark_ocr_versions/ocr_release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,19 @@ Release date: 23-01-2024
* New Dicom Pretrained Pipelines.
* New VisualDocumentProcessor.

</div><div class="h3-box" markdown="1">

## New Obfuscation Features in ImageDrawRegions
ImageDrawRegions' main purpose is to draw solid rectangles on top of regions that typically come from NER or some other similar model. Many times, it is interesting not to only draw solid rectangles on top of detected entities, but some other values, like obfuscated values. For example, with the purpose of protecting patient's privacy, you may want to replace a name with another name, or a date with a modified date.

This feature, together with the Deidentification transformer from Spark NLP for Healthcare can be combined to create a 'rendering aware' obfuscation pipeline capable of rendering obfuscated values back to the source location where the original entities were present. The replacement must be 'rendering aware' because not every example of an entity requires the same space on the page to be rendered. So for example, 'Bob Smith' would be a good replacement for 'Rod Adams', but not for 'Alessandro Rocatagliata', simply because they render differently on the page. Let's take a look at a quick example,

![image](/assets/images/ocr/obfuscation_impainting.png)
![New Obfuscation Features in ImageDrawRegions](/assets/images/ocr/obfuscation_impainting.png)

to the left we see a portion of a document in which we want to apply obfuscation. We want to focus on the entities representing PHI, like patient name or phone number. On the right side, after applying the transformation, we have an image containing fake values.
You can see that the PHI in the source document has been replaced by similar entities, and these entities not only are of a similar category, but are also of a similar length.

</div><div class="h3-box" markdown="1">

## New obfuscation features in DicomMetadataDeidentifier
Now you can customize the way metadata is de-identified in DicomMetadataDeidentifier. Customization happens through a number of different actions you can apply to each tag, for example, replacing a specific tag with a literal, or shifting a date by a number of days randomly.
Expand Down Expand Up @@ -70,6 +73,7 @@ ShiftTimeByRandomNbOfSecs | DT | coherent
replaceWithRandomName | PN, LO | coherent
shiftDateByFixedNbOfDays | DA | 112

</div><div class="h3-box" markdown="1">

### New Dicom Pretrained Pipelines
We are releasing three new Dicom Pretrained Pipelines:
Expand All @@ -79,6 +83,8 @@ We are releasing three new Dicom Pretrained Pipelines:

Check notebook [here](https://github.com/JohnSnowLabs/visual-nlp-workshop/blob/master/jupyter/Dicom/SparkOcrDicomPretrainedPipelines.ipynb) for examples on how to use this.

</div><div class="h3-box" markdown="1">

### New Visual Document Processor
New VisualDocumentProcessor that produces OCR text and tables on a single pass!,
In plugs and play into any Visual NLP pipeline, it receives images, and it returns texts and tables following the same existing schemas for these datatypes,
Expand All @@ -93,6 +99,8 @@ result = proc.transform(df)

Check this [sample notebook](https://github.com/JohnSnowLabs/visual-nlp-workshop/blob/master/jupyter/SparkOcrVisualDocumentProcessor.ipynb) for an example on how to use it.

</div><div class="h3-box" markdown="1">

### Other Dicom Changes
* DicomDrawRegions support for setting compression quality, now you can pick different compression qualities for each of the different compression algorithms supported. The API receives an array with each element specifying the compression type like a key/value,
Example,
Expand All @@ -101,6 +109,8 @@ DicomDrawRegions()\
.setCompressionQuality(["8Bit=90","LSNearLossless=2"])
```

</div><div class="h3-box" markdown="1">

### Enhancements & Bug Fixes
* New parameter in SVS tool that specifies whether to rename output file or not,
```
Expand Down
Loading

0 comments on commit 3e28540

Please sign in to comment.