Difference between Dependency Management and Dependencies in Maven

Maven is a software management tool, use to manage information, dependencies and other things for a project.

It has two mechanisms to add the dependencies of other modules/project. One is Dependencies tag and other is Dependency Managment. People often wonder whats a difference between the two. An important question is when to use what?

First, of all, we should have an idea of what is multi module applications.As in the case in case of multi module applications only they differ.

A multi-module project is, as its name suggests, a project that consists of multiple modules, where a module is a project. You can think of a multi-module project as a logical grouping of sub-projects. The packaging of a multi-module project is “pom” since it consists of a pom.xml file but no artifact (jar, war, ear, etc).

multimodule-web-spring_projects

That’s a technical story. But it is not great if we are trying to teach some one. As concepts should be as simple as a story for a 12-year-old child. So let’s understand dependencies and dependency management.

There is a man called Peter, he has two ice cream parlors, Gelatos, and Baskin Robbins. Peter has two children Ron and Seria.

Peter has 2 ice cream in gelatos ice cream parlor – mango[ basic version ] and straberry[ moderate version] and 1 ice cream in Baskin Robbins – black current[high version].

Both Ron and Seria can have all the ice creams which their father has in Gelatos parlor, but they can have ice from Baskin Robbins only if they ask for it.Which means they can only have black current if they ask for specific it, but they don’t have to mention which version of black current they need, as their father Peter already know which kind of black current ice we have.

Here Peter is parent module and,  Ron and Seria are child modules.Gelatos is dependencies tag and Baskin Robin is Dependency Management tag.

screen-shot-2015-06-28-at-18-18-31-480x275

So all the dependencies present in dependencies tag will be available to all its child modules.But all the dependencies present in dependency management of parent module will be available to the child only if those dependencies are declared in dependencies tag of child module.

So why are we even using dependency management if dependencies tag passes all the dependency to its children?

  1. As all the children might not need all the dependencies present in parent module, so it is always wise to use Dependency Managment.
  2. Using Dependency Managment we can create a consistency of which version and tag of any artifact which we are using through out the application.[It help to maintain the version of artifact]
  3. Dependency Managment manage version, scope, and exclusion of artifact in child modules.

Example –

Parent POM- A

</pre>
<pre><?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xmlns="http://maven.apache.org/POM/4.0.0"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.test</groupId>
    <artifactId>A</artifactId>
    <packaging>pom</packaging>
    <version>1.0-SNAPSHOT</version>
    <modules>
        <module>B</module>
        <module>C</module>
    </modules>

    <dependencies>
            <dependency>
            <groupId>com.external</groupId>
            <artifactId>d1</artifactId>
            <version>1</version>
        </dependency>
    </dependencies>
    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.external</groupId>
                <artifactId>d2</artifactId>
                <version>1</version>
            </dependency>
            <dependency>
                <groupId>com.external</groupId>
                <artifactId>d3</artifactId>
                <version>1</version>
            </dependency>
        </dependencies>

    </dependencyManagement>
</project></pre>
<pre>

It has 2 child module A and B. Parent A has 3 dependencies in total:-

  • d1 [Inside dependencies tag]
  • d2 [Inside dependency management tag]
  • d3 [Inside dependency management tag]

 

 Child Pom B

</pre>
<pre><?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <groupId>com.test</groupId>
        <artifactId>A</artifactId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.test</groupId>
    <artifactId>B</artifactId>
    <packaging>pom</packaging>

    <dependencies>
        <dependency>
            <groupId>com.external</groupId>
            <artifactId>d2</artifactId>
        </dependency>
        <dependency>
            <groupId>com.external</groupId>
            <artifactId>d4</artifactId>
            <version>1</version>
        </dependency>
    </dependencies>
</project></pre>
<pre>

It will have access to 3 artifacts

  • d1 [coming from dependencies of parent ]
  • d2 [coming from  parent dependency management as mention in its dependencies tag]
  • d4 [coming from its dependencies tag]

Note:- It will not have the d3 artifact, as d3 is mention dependency management tag of parent pom [POM A] but not present in child pom [POM B].

Child Pom C

</pre>
<pre><?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <groupId>com.test</groupId>
        <artifactId>A</artifactId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.test</groupId>
    <artifactId>C</artifactId>
    <packaging>pom</packaging>

    <dependencies>
        <dependency>
            <groupId>com.external</groupId>
            <artifactId>d3</artifactId>
        </dependency>
        <dependency>
            <groupId>com.external</groupId>
            <artifactId>d5</artifactId>
            <version>1</version>
        </dependency>
    </dependencies>
</project></pre>
<pre>

Similarly, It will have access to 3 artifacts

  • d1 [coming from dependencies of parent ]
  • d3 [coming from  parent dependency management as mention in its dependencies tag]
  • d5 [coming from its dependencies tag]

Note:- It will not have the d3 artifact, as d3 is mention dependency management tag of parent pom [POM A] but not present in child pom [POM B].


 

Further Reading – http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html

 

What Yagni Is?

Often developer debates a lot about what YAGNI is actually. Writing any new piece of code bind developer’s hand in the name of YAGNI.

I have worked in both product base company and service based company.I personally feel that YAGNI is more appreciated in the service based company, as in product based any extra feature is helpful.

People back YAGNI by saying “You don’t wipe before you shit.”.So true.We should not increase the scope of the problem as it may open the gates of new bugs.So YAGNI tells us “don’t write any extra/new code unless it is actually required”.

I agree with the last quoted statement.But should we bind our hands in implementing the requirement in a fashion that it should always be open for extensibility, or to write any functionality in a more generic way so that it can be reused any time in near future?

So, what my definition of YAGNI is: – Always write the code in the scope of the current requirement, but don’t cry out YAGNI if you want to solve it in the generic or extensible way.The code should close for modification but open for extensibility.

Try to solve problems in generic ways, that can be used in future.

Difference Between Generative and Discriminative machine learning

To understand these two models we first have to see what is the difference between joint probability [P(x,y)] and conditional probability[P(x|y)].

Joint probability:  p(A and B).  The probability of event A and event B occurring.  It is the probability of the intersection of two or more events.  The probability of the intersection of A and B may be written p(A ∩ B). Example:  the probability that a card is a four and red =p(four and red) = 2/52=1/26.  (There are two red fours in a deck of 52, the 4 of hearts and the 4 of diamonds).

Conditional probability:  p(A|B) is the probability of event An occurring, given that event B occurs. Example:  given that you drew a red card, what’s the probability that it’s a four (p(four|red))=2/26=1/13.  So out of the 26 red cards (given a red card), there are two fours so 2/26=1/13.

For better understanding, click here for more on probability.

generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal? Let’s say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution p(x,y). A generative algorithm models how the data was “generated”, so you ask it “what’s the likelihood this or that class generated this instance?” and pick the one with the better probability.

discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal. Discriminative model learns the conditional probability distribution p(y|x) – which you should read as the probability of y given x. A discriminative algorithm uses the data to create a decision boundary, so you ask it “what side of the decision boundary is this instance on?

The fundamental difference between discriminative models and generative models is:

  • Discriminative models learn the (hard or soft) boundary between classes
  • Generative models model the distribution of individual classes

Given input data point x, the aim is to predict continuous (regression) or discrete (classification) output. That is given x, we are interested in modeling p(y|x). There are three approaches to this:

1. Generative Models:
One way is to model p(x, y) directly. Once we do that, we can obtain p(y|x) by simply conditioning on x. And we can then use decision theory to determine class membership i.e. we can use loss matrix, etc. to determine which class the point belongs to (such an assignment would minimize the expected loss). For e.g. in Naive Bayes model, you can learn p(y), the prior class probabilities from the data. You can also learn p(x|y) from the data using said maximum likelihood estimation (or you can Bayes estimator if you will). Once you have p(y) and p(x|y), p(x, y) is not difficult to find out.

2. Discriminative Models:
Instead of modeling p(x, y), we can directly model p(y|x), for e.g. in logistic regression p(y|x) is assumed to be of the form 1 / (1 + exp(-sigma(wi. xi))). All we have to do in such a case is to learn weights that would minimize the squared loss.

Generative models often outperform discriminative models on smaller datasets because their generative assumptions place some structure on your model that prevent overfitting. For example, let’s consider Naive Bayes vs. Logistic Regression. The Naive Bayes assumption is of course rarely satisfied, so logistic regression will tend to outperform Naive Bayes as your dataset grows (since it can capture dependencies that Naive Bayes can’t). But when you only have a small data set, logistic regression might pick up on spurious patterns that don’t really exist, so the Naive Bayes acts as a kind of regularizer on your model that prevents overfitting. There’s a paper by Andrew Ng and Michael Jordan on discriminative vs. generative classifiers that talks about this more.

Whenever an algorithm involves assuming, calculating or estimating the distribution of Y, it is generative, or simply put, if the algorithm cares about the distribution of Y, it is generative, if not, then it is discriminative.

Now a Small story to tell your 12-year-old kid, so that they can also understand the difference between these two models

Let’s say you have two kids “Gen” and “Dis”, and since their birth, they never opened their eyes. Today is the first day they will open their eyes, and you want to celebrate this occasion by teaching them the difference between Cat and Dog. You take them to pet store nearby.

Before showing around, you tell Gen and Dis to pay special attention to color, size, eye color, fur size, their voice etc.(feature set) of the pets they are going to see. After the end of this visit, you want to check if they understood the difference between cat and dog.

Now you give two photos one of a cat and one of a dog to Dis and ask which one is which. Dis has meticulously written down several conditions like if the voice sounds like meow and eyes are blue or green and has stripes with color brown or black then the animal is a cat. Thanks to her relatively simple rules, she quickly detected which one is a cat and which one is a dog.

Now instead of giving two photos you gave Gen two pieces of blank paper and ask her to draw what a cat and a dog looks like.

Well now, given any photo Gen can also tell which one is cat and which one is dog based on the drawing she created. In most cases drawing of cat and dog was unnecessary and time consuming for the task of detection which one is a cat.

But if there were only a few dogs and cats to look for Gen and Dis (low training data). In such cases if you give a photo of a brown dog with stripes with blue eyes, there is a chance that Dis would mark it as a cat. While Gen has her drawing and she can better detect that this photo is of a dog.

If you ask Gen to pay attention to more things(features), it will create a better sketch. But, if you show more examples(data-set) of cat and dog, Dis would mostly be better than Gen.

Since Dis is very meticulous in her observations if you ask her to pay attention to more features it will create more complicated rules(overfitting) and the chance of wrongly identifying a cat and a dog will increase, but that would not happen easily with Gen.

What if before going to pet store I don’t tell them that there are only two types of animal(no labeled data). Dis would fail completely because she will not know what to look for while Gen would be able to draw the sketch anyway. This is a huge advantage sometimes(semi-supervised).

Now let me reveal the suspense which you might already know: Dis is for discriminative and Gen is for generative.

Continue reading

MicroService – Brief Introduction

This post will focus on what Microservices are, why is it so famous these days, what are the positive and negative aspects of these services and what all area we will try to cover in future posts.

In addition, will share few youtube links, which are quite helpful to understand this concept. As a developer, I will say it is just one of the way how well we are packaging or modularizing code.

First of all, I would like to tell you, that I am inclined toward MicroServices, so most the things you find here will kind be in favor of these.But I will discuss all the challenges which you might face when you try to follow the awesome journey of MicroServices.

What is Micro Service?

Rather than going for a definition we will try to find the what are the common characteristics of Mirco Services.

In simple terms, you can think of a very small independent project capable of performing all the task.So If in a monolith you have various responsibilities, let each of this responsibility as a separate service.What responsibility does not have any concrete boundary. So this lets to the rise of the new question, how big or how small the micro services should be.Some people says it should be small enough to be handled by one developer, some say it should not be more than then few hundred line of code.To solve this, we will term service as a micro service if they have few of the properties/characteristics mention below.

Characteristics of MicroServices

  • Can be upgraded or rewrite independently.
  • Have fault tolerance and monitoring mechanism.
  • Each service is a complete Product.
  • Should have their own Data Management.
  • Should be easily replacable.
  • Should only expose the endpoint to dependendant services.

 

What a fuss is this, and Why is it becoming so popular these days?

As we all know that it is simple to solve number of small problems and then join all of them to solve the bigger problem.What we call is devide and conqure. Only problem in this approch is we need a very good merger technique to have a successful solution.With Devops of today(Docker,Kubernets,Mesos) it have been made possible to developer to manage large number of services and there deployment.Even it has helped to increase the resource utilization , which have further decreased the cost of maitaining multiple services.

Pros

  • It break the problem into smaller problem, helping developer to solve it more accuratley and in most optimized manner.
  • Partial Deployment and Partial Upgradation of Application is possible.
  • Help to reduce the develoment time.
  • We can easily rewite any service .
  • Services can be written in any programming languages , and we can easily try new progamming language.
  • Easier to find and fix bottleneck in the system.
  • Maintainance and Bug fix can be simper.
  • Test scope increases as we have to test smaller units independently.
  • CI/CD can be implemented easily.
  • Preserve modularity.

Cons

  • It increases the Devops activities.
  • Organization have to monotor and handle large number of sevices.
  • Service Discovery and trackng of request can be tidious task.
  • Need change in Oraganization culture.
  • Developer have to work more closely with devops , in order to make the development more stream line.
  • Need advance Devops.
  • Since it is kinda a new, not every one have clear idea what is a baoundary of MiroService.

Some Great Videos to give more insight.

 

Topics to be cover in upcoming blogs

  • Modules vs Microservices
  • How to share domain between different micro services
  • Should we have a common DAO[Database ] layer for every service or each service should have their own DAO layer?
  • All micro services in single Git Repo or they should have separate Git Repo
  • How to do monitoring of Micro Services.
  • How to track the flow of any particular request in real time between services.
  • Containerization of micro services.
  • Creating an environment of  Containerization service, which is self-deployed.
  • When to go for Micro Services.
  • How to shift from Monolith to Microservices.
  • How to perform Integration testing involving couple of Microservices as dependencies.
  • Distributed logging for Services.

Hashing And Different Techniques

What is hashing why we need it?

Consider it as a process of converting any input to an integer value.A string can have a hash value, a java Object can have hash value.

We need it to place these input to particular cell or bucket.So that whenever we need to find this object, we will find it in particular cell or bucket only.This will decrease the search time for an input object.

Click here to understand this part via youtube tutorial

Good Hashing Techniques

Since we need to convert the input value to some integer value, method or step to convert it should be simple and faster, as a user should not spend much time and effort on the secondary task.The second thing you should keep in mind is that your function should create different int value for different input for maximum cases.Otherwise, you will end up keeping elements in single or only a few buckets which will make search more problematic.

Two thing you have to learn is

  1. Event Distribution and Easy Computation
  2. Collision Detection and Resolution
  3. Collision resolution
    1. Linear Probing
    2. Quadratic Probing
    3. Double Hashing 

This is a minimum of hashing everyone should know.Will try to cover the hashing in depth in my upcoming blogs.